Friday 26 August 2011

Use LIMIT to manage memory as you fetch

Question:
I need to query more than 100,000 rows from a table into my PL/SQL program. I want to use BULK COLLECT and from a memory consumption standpoint can only afford to populate a collection with 5000 rows at a time. I know that I should use the LIMIT clause, like this: FETCH my_cursor BULK COLLECT INTO my_collection LIMIT 5000; But how do I execute this statement so that it processes all the rows of data and not just the first 5000 rows?
Answer:
You are absolutely correct: using the LIMIT clause will constrain the amount of PGA (Process Global Area) memory consumed by your session when using a BULK COLLECT.
All you will need to do is place that FETCH statement inside a loop and make sure to terminate that loop by analyzing the contents of the collection, rather than the %NOTFOUND cursor attribute. I offer an example of the kind of code you will want to write below.
Suppose I define the following table and then populate it with 107,000 rows of data:

CREATE TABLE stuff (pk INTEGER, description VARCHAR2(100))
/

BEGIN
FOR indx IN 1 .. 107000
LOOP
INSERT INTO stuff
( pk, description
)
VALUES ( indx, 'Stuff ' || indx
);
END LOOP;

COMMIT;
END;
/
I then need to iterate through all rows and call the process_one_row procedure for each. I will use BULK COLLECT to improve the performance of my query, but I can only fetch 5000 rows at a time, due to memory constraints. My PL/SQL block will then look like this:
1 DECLARE
2 CURSOR stuff_cur
3 IS
4 SELECT *
5 FROM stuff;
6
7 TYPE stuff_aat IS TABLE OF stuff%ROWTYPE
8 INDEX BY BINARY_INTEGER;
9
10 l_stuff stuff_aat;
11
12 PROCEDURE process_one_row ( row_in IN stuff%ROWTYPE )
13 IS
14 BEGIN
15 NULL;
16 END;
17
18 PROCEDURE PRINT ( count_in IN PLS_INTEGER, notfound_in IN BOOLEAN )
19 IS
20 BEGIN
21 IF notfound_in
22 THEN
23 DBMS_OUTPUT.put_line ( 'Fetched '
24 || TO_CHAR ( count_in )
25 || ' rows and %NOTFOUND = TRUE'
26 );
27 ELSIF NOT notfound_in
28 THEN
29 DBMS_OUTPUT.put_line ( 'Fetched '
30 || TO_CHAR ( count_in )
31 || ' rows and %NOTFOUND = FALSE'
32 );
33 ELSE
34 DBMS_OUTPUT.put_line ( 'Fetched '
35 || TO_CHAR ( count_in )
36 || ' rows and %NOTFOUND IS NULL'
37 );
38 END IF;
39 END PRINT;
40 BEGIN
41 OPEN stuff_cur;
42
43 LOOP
44 FETCH stuff_cur
45 BULK COLLECT INTO l_stuff LIMIT 5000;
46
47 PRINT ( l_stuff.COUNT, stuff_cur%NOTFOUND );
48 EXIT WHEN l_stuff.COUNT = 0;
49
50 --
51 FOR l_row IN l_stuff.FIRST .. l_stuff.LAST
52 LOOP
53 process_one_row ( l_stuff ( l_row ));
54 END LOOP;
55 END LOOP;
56
57 CLOSE stuff_cur;
58* END;
Here is an explanation of this code:
Lines 2-5
Declare the cursor that will identify the 107,000 rows I need.
Lines 7-10
Declare a collection type that mimics the structure of the just_5000 relational table, and then declare an instance of that type.
Line 41
Open the cursor, but do not fetch any data.
Lines 43-55
Fetch the data inside a loop. Since I can only fetch 5000 rows at a time, I need to execute the fetch 21 times. Of course, you very rarely know the exact number of rows being fetched, so I will not use a numeric FOR loop (from 1 to 21). Instead, I use a simple loop.
Lines 44-45
Fetch the next 5000 rows from the result set, and put them in the l_just_5000 collection.
Line 48
Check to see if I have fetched all rows. Notice that I see if there is anything in the collection by calling the COUNT method. With non-bulk queries, I would usually write
EXIT WHEN stuff_cur%NOTFOUND
but if I take this approach in my program, I will not process the last 2000 rows of data. The 21st fetch will deposit the last 2000 rows into my collection, but will also set the %NOTFOUND attribute to TRUE (Oracle has fetched past the end of the result set).
Lines 51-54
Now it is time to process each of the individual rows. BULK COLLECT always fills sequentially from row 1, so I will use a numeric FOR loop to iterate through all the rows and call the process_one_row procedure.
Line 57
Close the cursor when done.
When you run the above script (appended below without line numbers), you will see this output:
Fetched 5000 rows and %NOTFOUND IS FALSE
...
Fetched 5000 rows and %NOTFOUND IS FALSE
Fetched 2000 rows and %NOTFOUND = TRUE
Fetched 0 rows and %NOTFOUND = TRUE
So when you use the LIMIT clause, you need to put your fetch inside a simple loop, and then after you fetch N rows, you will execute a FOR loop (within the simple loop) to iterate through those rows, taking the necessary application-specific action.
Full script:

CREATE TABLE stuff (pk INTEGER, description VARCHAR2(100))
/

BEGIN
FOR indx IN 1 .. 107000
LOOP
INSERT INTO stuff
( pk, description
)
VALUES ( indx, 'Stuff ' || indx
);
END LOOP;

COMMIT;
END;
/

DECLARE
CURSOR stuff_cur
IS
SELECT *
FROM stuff;

TYPE stuff_aat IS TABLE OF stuff%ROWTYPE
INDEX BY BINARY_INTEGER;

l_stuff stuff_aat;

PROCEDURE process_one_row ( row_in IN stuff%ROWTYPE )
IS
BEGIN
NULL;
END;

PROCEDURE PRINT ( count_in IN PLS_INTEGER, notfound_in IN BOOLEAN )
IS
BEGIN
IF notfound_in
THEN
DBMS_OUTPUT.put_line ( 'Fetched '
|| TO_CHAR ( count_in )
|| ' rows and %NOTFOUND = TRUE'
);
ELSIF NOT notfound_in
THEN
DBMS_OUTPUT.put_line ( 'Fetched '
|| TO_CHAR ( count_in )
|| ' rows and %NOTFOUND = FALSE'
);
ELSE
DBMS_OUTPUT.put_line ( 'Fetched '
|| TO_CHAR ( count_in )
|| ' rows and %NOTFOUND IS NULL'
);
END IF;
END PRINT;
BEGIN
OPEN stuff_cur;

LOOP
FETCH stuff_cur
BULK COLLECT INTO l_stuff LIMIT 5000;

PRINT ( l_stuff.COUNT, stuff_cur%NOTFOUND );
EXIT WHEN l_stuff.COUNT = 0;

FOR l_row IN l_stuff.FIRST .. l_stuff.LAST
LOOP
process_one_row ( l_stuff ( l_row ));
END LOOP;
END LOOP;

CLOSE stuff_cur;
END;
/

No comments:

Post a Comment