Friday, 26 August 2011

What Loop with BULK COLLECT collections

Question:
I noticed in your answer to a question regarding PL/SQL support for bi-directional cursors that you used BULK COLLECT to fill a collection, and then iterated through the contents of that collection using a WHILE loop, and the FIRST and NEXT methods.
As I am sure you know, BULK COLLECT queries always fill the collection sequentially, starting from row 1. Therefore, you could simply use a FOR loop (from 1 to collection.COUNT) to iterate through the collection elements. This approach requires less code, is more efficient, and is very readable, so why didn't you take that approach?
Answer:
Bryn is correct; I could have used a FOR loop in my bidirectional emulation example. You can, in fact, use any kind of PL/SQL loop to iterate through the indexes of a collection: FOR loop, WHILE loop, or a simple loop. And regardless of the loop type that you choose, you need to take into account the following concerns:
• If the collection is empty, you don't want the loop body to execute even a single time.
• If within the loop body you attempt to read an element at an index that does not exist, Oracle will raise the NO_DATA_FOUND exception.
Before I explain why I chose the WHILE loop, let's take a look at the code you would write when using each of these kinds of loops. Then I will respond to Bryn's suggestion and finish up with my advice to you.
Suppose I have a table that contains jokes:
CREATE TABLE jokes (
joke_id INTEGER,
title VARCHAR2(100),
text VARCHAR2(4000)
)
/
I need to iterate through all the jokes in the table and display the title of each joke. In each block below, I will use a BULK COLLECT query to retrieve all the rows of the table and move them into my collection. I will then write three different loops to iterate through the collection's contents.
As Bryn correctly points out, when I use BULK COLLECT to retrieve multiple rows of data, that data is deposited into a collection that is populated sequentially, starting with index value 1. If no rows are identified by the query, the collection is empty.
Here we go!
Using the FOR loop
DECLARE
TYPE joke_tt IS TABLE OF jokes%ROWTYPE INDEX BY PLS_INTEGER;
joke_cache joke_tt;
BEGIN
SELECT * BULK COLLECT INTO joke_cache FROM jokes;

FOR indx IN 1 .. joke_cache.COUNT
LOOP
DBMS_OUTPUT.put_line (' ' || joke_cache (indx).title);
END LOOP;
END;
/
With a FOR loop, I iterate from the first index, 1, to the sm COUNT of rows in the collection. If the collection is empty (there's nothing in the jokes table), sm COUNT returns 0, and the sm FOR loop body will not execute even once. Because I know that the collection will be densely-filled, I don't have to worry about raising the NO_DATA_FOUND exception.
Notice that with the FOR loop implementation, I don't have to: declare the local variable to hold the index value; set up the WHILE loop with a call to the FIRST method; explicitly code a loop terminating condition; or call the NEXT method to move to the next row. The FOR loop does all of that for me, implicitly.
Next stop: the WHILE loop.
Using the WHILE Loop
DECLARE
TYPE joke_tt IS TABLE OF jokes%ROWTYPE INDEX BY PLS_INTEGER;
joke_cache joke_tt;

l_index PLS_INTEGER;
BEGIN
SELECT * BULK COLLECT INTO joke_cache FROM jokes;

l_index := joke_cache.FIRST;

WHILE (l_index IS NOT NULL)
LOOP
DBMS_OUTPUT.put_line (' ' || joke_cache (l_index).title);
l_index := joke_cache.NEXT (l_index);
END LOOP;
END;
/
With the WHILE loop, I must declare a local variable to hold the current index into the collection. After populating the collection, I call the FIRST method to obtain the lowest-defined element in the collection. If the collection is empty, FIRST returns NULL and the loop body never executes. Otherwise, the title is displayed and then I move to the next defined index with the NEXT method.
Using the Simple Loop
DECLARE
TYPE joke_tt IS TABLE OF jokes%ROWTYPE INDEX BY PLS_INTEGER;
joke_cache joke_tt;

l_index PLS_INTEGER;
BEGIN
SELECT * BULK COLLECT INTO joke_cache FROM jokes;

l_index := joke_cache.FIRST;

LOOP
EXIT WHEN l_index IS NULL;
DBMS_OUTPUT.put_line (' ' || joke_cache (l_index).title);
l_index := joke_cache.NEXT (l_index);
END LOOP;
END;
/
The code using a simple loop varies only slightly from the WHILE loop implementation. I still call the FIRST method to obtain the lowest-defined element in the collection. But I move the termination condition for the loop inside the loop with the EXIT statement. Again, if the collection is empty, FIRST returns NULL and the loop immediately terminates. Otherwise, the title is displayed and then I move to the next defined index with the NEXT method.
Comparing the Approaches
So: three different types of loops can be used to iterate through the contents of a collection. Which should you use?
The FOR loop is a good choice for iterating through a collection's contents when you know for certain that the collection is densely-filled; that is, ever index between the first-defined row and the last-defined row is defined. If there are no gaps in the collection, then you can use the FOR loop with confidence and, as Bryn points out in his question, write and maintain less code in the process.
The WHILE loop is your best bet when you are working with a collection that might have a gap: one or more index values between lowest and highest that are not defined. In this situation, the use of the NEXT method neatly avoids the NO_DATA_FOUND exception.
You could also use the simple loop with a sparsely-filled collection. Generally, however, you should use a simple loop when you want to execute the body of the loop at least once. If the first line in your loop body is the EXIT WHEN statement, the WHILE loop is a more natural and intuitive implementation.
Should performance be a factor in choosing the type of loop?
Bryn also points out that the FOR loop implementation should be more efficient. How much of a factor should that be in my decision on which kind of loop to write?
My analysis shows that the FOR loop is, indeed, more efficient than the WHILE loop for iterating through a densely-filled collection, but the different is so slight as to be irrelevant for most applications. For example, in the anonymous block shown below, I populate a collection with 10,000,000 elements. I then iterate through that collection using FOR and WHILE loops.
The FOR loop consistently takes about 1.25 seconds to iterate through all the rows. The WHILE loop took about 3.15 seconds. So alternatively, you could argue that the FOR loop approach is almost 3X faster than the WHILE loop—but that only amounts to an extra 2 seconds over 10,000,000 elements. Since your collections are generally going to be several orders of magnitude smaller, I don't think this performance differential should be a factor in your selection of a collection-iterating technique.
Summary
In the comparison above, I noted that the FOR loop is most appropriate when you know that the collection is densely-filled. The bidirectional cursor emulation example fits that bill, and yet still I used a WHILE loop. In fact, I virtually always use a WHILE loop when iterating through a collection and I will continue to make that recommendation to my readers and students.
You can't really go wrong with a WHILE loop; it might run marginally slower than a FOR loop, but I doubt it's a performance hit that will ever be noticed. But it will ensure that you never raise NO_DATA_FOUND, whether your collection is sparsely- or densely-filled.
To use the FOR loop, I must make an assumption about the collection—namely, that it is either densely-filled or empty; the collection may not be sparse. (The FOR loop will raise NO_DATA_FOUND if it encounters a "gap" or undefined index location between 1 and the COUNT.) Today, that seems like an awfully safe assumption to make. After all, the FOR loop comes immediately after the SELECT statement.
What if over time, however, some additional code is placed between

SELECT * BULK COLLECT INTO joke_cache FROM jokes;
and
FOR indx IN 1 .. joke_cache.COUNT
and that code includes a call to the DELETE method to remove one or more rows from the collection? Something like this:
DECLARE
TYPE joke_tt IS TABLE OF jokes%ROWTYPE INDEX BY PLS_INTEGER;
joke_cache joke_tt;
BEGIN
SELECT * BULK COLLECT INTO joke_cache FROM jokes;

... lots of code

jokes.DELETE (jokes.LAST – 10);

... lots more code

FOR indx IN 1 .. joke_cache.COUNT
LOOP
DBMS_OUTPUT.put_line (' ' || joke_cache (indx).title);
END LOOP;
END;
/
Perhaps it should be obvious to the person adding this code that the FOR loop needs to be converted to a WHILE loop, but it is very likely that, in fact, this will be overlooked and a problem will have been introduced into the program. I'd much rather make the fewest assumptions possible in my code and thus make the code more resilient over time.
Having said that, when your code requires and expects the collection to be densely-filled, you should use the FOR loop implementation. Under these circumstances, a gap in the collection would indicate an error. This error should not be "covered up" by the WHILE loop; instead, you will want the error to interrupt normal program processing. You can do this by allowing the exception to propagate unhandled from the current block or you could place the FOR loop within its own block, then trap and log the error. Here is an example, courtesy of Bryn:
begin
for Idx in 1..Supposedly_Dense_Collection.Count()
loop
DBMS_Output.Put_Line(Supposedly_Dense_Collection(Idx));
end loop;
exception when No_Data_Found then
-- Your favorite logging technique
Raise_Application_Error(-20000, 'Supposedly_Dense_Collection wasn''t');
end;
Bryn also offers the following code to demonstrate (in his words) "a nice idiom for asserting that your collection is dense (actually, that it's unchanging after first being populated dense.) It relies on using the 'constant' keyword together with a forward-declared init function. I believe that this technique is relatively unknown."
procedure P is
begin
--...
<<Block_To_Process_Dense_Array>>declare
type Users_t is table of All_Users.Username%type;
function Users_In_DB return Users_t;
Users constant Users_t := Users_In_DB();
Nof_Users constant pls_integer := Users.Count();

function Users_In_DB return Users_t is
Users Users_t;
begin
select Username bulk collect into Users
from All_Users
order by Created;
if Users.Count() < 1 then raise Program_Error; end if;
return Users;
end Users_In_DB;

begin

-- Causes a compile error.
-- Users.Delete(1);

for j in 1..Nof_Users loop
DBMS_Output.Put_Line(Users(j));
end loop;

end Block_To_Process_Dense_Array;
--...
end P;

Bryn's continued explanation: "Block_To_Process_Dense_Array is set in a procedure just so that you can compile it first and then run it. It could occur anywhere at any depth of nesting. It's particularly nice for setting up a package body constant. (The idiom won't work for a spec constant unless the init function is in a separate compilation unit.)
"Note: I use Program_Error in these little demo programs just to keep the code shorter and not to detract from the main point with detail. In real life, you'd want to declare your own exception and/or use Raise_Application_Error();."
Script to compare performance of FOR and WHILE loops to iterate a collection

DECLARE
TYPE array_t IS TABLE OF PLS_INTEGER
INDEX BY PLS_INTEGER;

ARRAY array_t;
--
l_start_time PLS_INTEGER;
l_end_time PLS_INTEGER;
--
l_index PLS_INTEGER;
BEGIN
-- Populate the collection.
FOR n IN 1 .. 10000000
LOOP
ARRAY (n) := n;
END LOOP;

-- Cache the starting time.
l_start_time := DBMS_UTILITY.get_cpu_time;

FOR n IN 1 .. ARRAY.COUNT
LOOP
ARRAY (n) := n;
END LOOP;

-- Subtract end time from start time.
DBMS_OUTPUT.put_line ( 'FOR loop elapsed CPU time: '
|| TO_CHAR (DBMS_UTILITY.get_cpu_time - l_start_time)
);
--
l_start_time := DBMS_UTILITY.get_cpu_time;
l_index := ARRAY.FIRST;

WHILE (l_index IS NOT NULL)
LOOP
ARRAY (l_index) := l_index;
l_index := ARRAY.NEXT (l_index);
END LOOP;

DBMS_OUTPUT.put_line ( 'WHILE loop elapsed CPU time: '
|| TO_CHAR (DBMS_UTILITY.get_cpu_time - l_start_time)
);
END;
/

No comments:

Post a Comment