Thursday 8 September 2011

Global temporary tables or collections

Question:
You talk a lot about using collections, but I've never seen you write anything about global temporary tables. Wouldn't they be a good, fast alternative to collections? When should I use temporary tables and when should I use collections?
Answer:
You are correct: so far as I know, I have never written about global temporary tables. Never paid them much attention. Never seemed to have much use for them. But students at my trainings have asked about persistently and so, finally, I will address these handy structures.
Collections, as I hope you are all aware, are PL/SQL data structures that help you to create and manage lists of information within your PL/SQL program. Collections utilize process global area (PGA) memory vs. system global area (SGA) memory (caveat: you can define columns in persistent tables as collections; the memory for these collections comes out of the SGA). That is, the data in a collection is private to a particular Oracle session.
For more background on collections, check out:
- Oracle documentation
- Chapter 12 of Oracle PL/SQL Programming
Global temporary tables are relational tables that are not persistent (stored in the database, on disk) and whose data is not shared across sessions. The data in a global temporary table can be either transaction-specific (rows are deleted after a commit) or session-specific (rows are deleted when the session is terminated.
For more background on global temporary tables, check out this Oracle documentation.
And then of course you have good, old persistent, relational tables, whose data is stored on disk and is accessible to any schema that has the required privileges on the table.
So which data structure should you use, and when? Here are some recommendations:
• If you are manipulating data within your current session only and need to do so in the fastest manner possible, use collections whenever possible. By avoiding the SQL language and all the processing that goes with it, collections have much less overhead and greatly speed up program performance.
• If you need your data to persist beyond your session and be shared between schemas, you should use a standard relational table (hereafter referred to simply as tables).
• If you need to use the power of the SQL language to manipulate your data, you should use tables or global temporary tables. Use global temporary tables if you are manipulating data within your session only, and do not need to save that information to disk or share it across schemas. (Note: you can manipulate collections in SQL to some extent by applying the TABLE operator to a collection, but it seems to me that if you are going to be using lots of SQL, you should probably switch over to a global temporary table.)
• If you need to work with lists of data that include non-SQL datatypes, like Booleans, use collections.
• If your application has a large number of simultaneous users, using collections will lead to a higher consumption of system memory (each session will have its own PGA memory). You may need to fall back on global temporary tables; you will lose some performance, but avoid excessive memory consumption.
To measure the differences in performance and memory utilization, I built and ran the script you will find at the end of this answer. Here are the results of running this script on an Oracle Database 10g Release 2 database (all timings are), populating and then manipulating a table, global temporary table and collection with 100000 rows of data:
Compare persistent table, global temporary table and
collection for 100000 rows of data
session pga memory: 1,556,052
"Populate persistent table": 14.06 seconds
session pga memory: 1,752,660
"Populate temporary table": 7.46 seconds
session pga memory: 1,818,196
"Populate collection": .17 seconds
session pga memory: 30,064,212

"Get count of rows in persistent table": .01 seconds
"Get count of rows in temporary table": .02 seconds
"Get count of rows in collection": 0 seconds

"Read each row in persistent table": 4.45 seconds
"Read each row in temporary table": 4.65 seconds
"Read each row in collection": .04 seconds

"Lookup row in persistent table by name (unique index)": 4.7 seconds
"Lookup row in temporary table by name (unique index)": 4.8 seconds
"Lookup row in collection by name (work from copy)": .48 seconds
While my script does not test every circumstance under which you might use one of these structures, I believe the overall situation is clear: manipulating data in collections in PGA memory is substantially faster than doing so via tables (persistent or temporary) in SGA memory. The quickness comes, however, from a classic trade-off: memory vs. CPU. When working with collections, your programs will consume more PGA memory. For a multi-user application, this consumption can become an issue. Reliance on global temporary tables moves the memory utilization into the SGA, and therefore offers a more scalable solution when working with large numbers of users.
Performance and memory analysis script
DROP TABLE pt_id_name
/
CREATE TABLE pt_id_name (
ID NUMBER,
name VARCHAR2(100))
/
CREATE UNIQUE INDEX pt_id_name_index1 ON pt_id_name (ID)
/
CREATE UNIQUE INDEX pt_id_name_index2 ON pt_id_name (name)
/
DROP TABLE gtt_id_name
/
CREATE GLOBAL TEMPORARY TABLE gtt_id_name (
ID NUMBER,
name VARCHAR2(100))
/
CREATE UNIQUE INDEX gtt_id_name_index1 ON gtt_id_name (ID)
/
CREATE UNIQUE INDEX gtt_id_name_index2 ON gtt_id_name (name)
/

DECLARE
c_size PLS_INTEGER := 100000;

TYPE id_name_aat IS TABLE OF pt_id_name%ROWTYPE
INDEX BY PLS_INTEGER;

l_id_name id_name_aat;

--
TYPE by_name_aat IS TABLE OF pt_id_name%ROWTYPE
INDEX BY pt_id_name.name%TYPE;

l_by_name by_name_aat;
--
l_record pt_id_name%ROWTYPE;
--
l_start PLS_INTEGER;
l_count PLS_INTEGER;
l_index pt_id_name.name%TYPE;

PROCEDURE show_memory
/*
SELECT privileges required on:
SYS.v_$session
SYS.v_$sesstat
SYS.v_$statname

Here are the statements you should run from a SYSDBA account:

GRANT SELECT ON SYS.v_$sesstat TO schema;
GRANT SELECT ON SYS.v_$statname TO schema;
*/
IS
l_memory VARCHAR2 ( 100 );
BEGIN
SELECT nm.name || ': ' || TO_CHAR ( st.VALUE , '999,999,999,999')
INTO l_memory
FROM SYS.v_$sesstat st, SYS.v_$statname nm
WHERE st.statistic# = nm.statistic#
AND st.SID = SYS_CONTEXT ( 'USERENV', 'SID' )
AND nm.name = 'session pga memory';

DBMS_OUTPUT.put_line ( l_memory );
END show_memory;

PROCEDURE show_elapsed ( NAME_IN IN VARCHAR2 )
IS
BEGIN
DBMS_OUTPUT.put_line ( '"'
|| NAME_IN
|| '": '
|| TO_CHAR ( ( DBMS_UTILITY.get_cpu_time
- l_start
)
/ 100
)
|| ' seconds'
);
END show_elapsed;
BEGIN
DBMS_SESSION.free_unused_user_memory;
DBMS_OUTPUT.put_line
( 'Compare persistent table, global temporary table and collection for '
|| c_size
|| ' rows of data'
);
show_memory;
--
-- POPULATE STRUCTURES
--
l_start := DBMS_UTILITY.get_cpu_time;

FOR indx IN 1 .. c_size
LOOP
INSERT INTO pt_id_name
VALUES ( indx, 'WOW' || indx );
END LOOP;

show_elapsed ( 'Populate persistent table' );
show_memory;
--
l_start := DBMS_UTILITY.get_cpu_time;

FOR indx IN 1 .. c_size
LOOP
INSERT INTO gtt_id_name
VALUES ( indx, 'WOW' || indx );
END LOOP;

show_elapsed ( 'Populate temporary table' );
show_memory;
--
l_start := DBMS_UTILITY.get_cpu_time;

FOR indx IN 1 .. c_size
LOOP
l_id_name ( indx ).ID := indx;
l_id_name ( indx ).name := 'WOW' || indx;
END LOOP;

show_elapsed ( 'Populate collection' );
show_memory;
--
-- COUNT OF ROWS
--
DBMS_OUTPUT.put_line ( '' );
l_start := DBMS_UTILITY.get_cpu_time;

SELECT COUNT ( * )
INTO l_count
FROM pt_id_name;

show_elapsed ( 'Get count of rows in persistent table' );
l_start := DBMS_UTILITY.get_cpu_time;

SELECT COUNT ( * )
INTO l_count
FROM gtt_id_name;

show_elapsed ( 'Get count of rows in temporary table' );
l_start := DBMS_UTILITY.get_cpu_time;
l_count := l_id_name.COUNT;
show_elapsed ( 'Get count of rows in collection' );
--
-- LOOKUP ROW BY INDEX
--
DBMS_OUTPUT.put_line ( '' );
l_start := DBMS_UTILITY.get_cpu_time;

FOR indx IN 1 .. c_size
LOOP
SELECT *
INTO l_record
FROM pt_id_name
WHERE ID = indx;
END LOOP;

show_elapsed ( 'Read each row in persistent table' );
--
l_start := DBMS_UTILITY.get_cpu_time;

FOR indx IN 1 .. c_size
LOOP
SELECT *
INTO l_record
FROM gtt_id_name
WHERE ID = indx;
END LOOP;

show_elapsed ( 'Read each row in temporary table' );
--
l_start := DBMS_UTILITY.get_cpu_time;

FOR indx IN 1 .. c_size
LOOP
l_record := l_id_name ( indx );
END LOOP;

show_elapsed ( 'Read each row in collection' );
--
-- LOOKUP ROW BY NAME
--
DBMS_OUTPUT.put_line ( '' );
l_start := DBMS_UTILITY.get_cpu_time;

FOR indx IN 1 .. c_size
LOOP
SELECT *
INTO l_record
FROM pt_id_name
WHERE name = 'WOW' || indx;
END LOOP;

show_elapsed ( 'Lookup row in persistent table by name (unique index)' );
--
l_start := DBMS_UTILITY.get_cpu_time;

FOR indx IN 1 .. c_size
LOOP
SELECT *
INTO l_record
FROM gtt_id_name
WHERE name = 'WOW' || indx;
END LOOP;

show_elapsed ( 'Lookup row in temporary table by name (unique index)' );
--
l_start := DBMS_UTILITY.get_cpu_time;

FOR indx IN 1 .. c_size
LOOP
l_by_name ( l_id_name ( indx ).name ) := l_id_name ( indx );
END LOOP;

l_index := l_by_name.FIRST;

WHILE ( l_index IS NOT NULL )
LOOP
l_record := l_by_name ( l_index );
l_index := l_by_name.NEXT ( l_index );
END LOOP;

show_elapsed ( 'Lookup row in collection by name (work from copy)' );
--
ROLLBACK;
END;
/

2 comments: