Thursday 8 September 2011

Do I really need to learn the syntax for BULK COLLECT and FORALL

Question:
I attended your seminar "You Wrote What?" in London, which was great by the way, and noted the emphasis you placed on using bulk binding to enhance performance. In particular you explained the trade-off between more complex code and performance.
I have just read an article by Bryn Llewellyn in SELECT Journal (2nd Qtr 2005) in which he states that the Oracle Database 10g optimizer transforms a cursor FOR loop into the equivalent of a bulk bind fetch. Does this mean that we PL/SQL programmers don't need to learn the syntax for bulk binding for Oracle Database 10g and later?
Answer:
Absolutely not! It is true (and very exciting) that the Oracle Database 10g PL/SQL optimizer will automatically optimize a cursor FOR loop to perform at levels comparable to a BULK COLLECT operation (run the script found at the end of this answer, written by Bryn Llewellyn, to verify this fact on your database). That is just one of the many reasons that you can expect to see a substantial boost in performance of PL/SQL-based applications merely by upgrading to Oracle Database 10Zg.
This very cool and non-trivial optimization definitely does not mean, however, that you can skip learning about and applying BULK COLLECT and FORALL (which I consider to be among the most important new features in Oracle PL/SQL since Oracle8i Database) in your application development. After all, the optimization shown in the script at the end of this answer only takes care of some of the scenarios in which bulk processing is highly recommended.
If, for example, you have repetitive DML statement execution within a cursor FOR loop (see the upd_for_dept procedure in Listing 1), the Oracle optimizer will improve the performance of the cursor FORloop, but it will not transform those row-by-row inserts and updates into FORALL-analogs. To fully optimize such code, you will have to shift to BULK COLLECT and then leverage the populated collections in subsequent FORALL statements.
Listing 1. Cursor FOR loop with DML operations
CREATE TABLE employee (
employee_id NUMBER(4),
last_name VARCHAR2(15),
first_name VARCHAR2(15),
middle_initial VARCHAR2(1),
job_id NUMBER(3),
manager_id NUMBER(4),
hire_date DATE,
salary NUMBER(7,2),
commission NUMBER(7,2),
department_id NUMBER(2)
)
/

CREATE TABLE employee_history (
employee_id NUMBER(4),
hire_date DATE,
salary NUMBER(7,2),
activity VARCHAR2(100)
)
/

CREATE OR REPLACE PROCEDURE log_error (
msg_in IN VARCHAR2 DEFAULT NULL
,code_in IN PLS_INTEGER DEFAULT NULL
)
IS
BEGIN
-- A "stub" program that simply displays the error information.
DBMS_OUTPUT.put_line (
'Error message: ' ||
NVL (msg_in, DBMS_UTILITY.FORMAT_ERROR_STACK));
DBMS_OUTPUT.put_line ('Error code: ' || NVL (code_in, SQLCODE));
END log_error;
/

1 CREATE OR REPLACE PROCEDURE upd_for_dept (
2 dept_in IN employee.department_id%TYPE
3 , newsal IN employee.salary%TYPE
4 )
5 IS
6 CURSOR emp_cur
7 IS
8 SELECT employee_id, salary, hire_date
9 FROM employee
10 WHERE department_id = dept_in;
11 BEGIN
12 FOR rec IN emp_cur
13 LOOP
14 BEGIN
15 INSERT INTO employee_history
16 ( employee_id, salary, hire_date
17 )
18 VALUES ( rec.employee_id, rec.salary, rec.hire_date
19 );
20
21 UPDATE employee
22 SET salary = newsal
23 WHERE employee_id = rec.employee_id;
24 EXCEPTION
25 WHEN OTHERS
26 THEN
27 log_error;
28 END;
29 END LOOP;
30* END upd_for_dept;
Listing 2 offers an example of such a rewrite, taking full advantage of FORALL functionality, including:
• Use of the LIMIT clause in the FETCH BULK COLLECT statement: LIMIT allows us to control the amount of PGA memory consumed by each session that executes a BULK COLLECT query. I use a LIMIT 100 clause in the query; this way, I retrieve 100 rows at a time, which will be substantially faster than row-by-row processing, but I do not consume excessive amounts of memory if the cursor identifies thousands of rows.
• SAVE EXCEPTIONS: this Oracle9i Database Release 2 feature tells Oracle to perform the DML statement for all the rows specified in the FORALL range clause, even if one or more of those statements raise an exception. When the FORALL is done, if at least one exception was encountered, Oracle raises the "bulk error exception". (ORA-24381; notice the use of the EXCEPTION_INIT pragma to give this error a name in my procedure.) You can then trap this exception and iterate through the contents of the SQL%BULK_EXCEPTIONS pseudo-collection to determine what errors had occurred. In this particular program, I will log the error and then remove the employee ID from the collection, so that this row in the employee will not be updated (thus matching the functionality in the original program). SAVE EXCEPTIONS is very smart and very useful functionality.
• FORALL with INDICES OF (an Oracle Database 10g Release 2 feature): Since I may have deleted one or more rows from the employee_ids collection, that collection may be sparse when the time comes to perform the FORALL INSERT statement. So I cannot ask the FORALL to process all rows between employee_ids.first and employee_ids.last; Oracle will raise an exception when it encounters an undefined row. Instead, I specify INDICES OF, and Oracle will then only process those rows defined in employee_ids. Cool stuff!
Listing 2. The fully transformed and optimized upd_for_dept program
CREATE OR REPLACE PROCEDURE upd_for_dept (
dept_in IN employee.department_id%TYPE
,newsal_in IN employee.salary%TYPE
)
IS
bulk_errors EXCEPTION;
PRAGMA EXCEPTION_INIT (bulk_errors, -24381);

TYPE employee_tt IS TABLE OF employee.employee_id%TYPE
INDEX BY BINARY_INTEGER;

employee_ids employee_tt;

TYPE salary_tt IS TABLE OF employee.salary%TYPE
INDEX BY BINARY_INTEGER;

salaries salary_tt;

TYPE hire_date_tt IS TABLE OF employee.hire_date%TYPE
INDEX BY BINARY_INTEGER;

hire_dates hire_date_tt;

CURSOR employees_cur
IS
SELECT employee_id
,salary
,hire_date
FROM employee
WHERE department_id = dept_in
FOR UPDATE;

PROCEDURE insert_history
IS
BEGIN
FORALL indx IN employee_ids.FIRST .. employee_ids.LAST SAVE EXCEPTIONS
INSERT INTO employee_history
(employee_id, salary, hire_date
)
VALUES (employee_ids (indx), salaries (indx), hire_dates (indx)
);
EXCEPTION
WHEN bulk_errors
THEN
FOR indx IN 1 .. SQL%BULK_EXCEPTIONS.COUNT
LOOP
log_error
( 'Unable to insert history row for employee '
|| employee_ids
(SQL%BULK_EXCEPTIONS (indx).ERROR_INDEX)
,SQL%BULK_EXCEPTIONS (indx).ERROR_CODE
);
-- Delete this row so that the update will not take place.
employee_ids.DELETE (SQL%BULK_EXCEPTIONS (indx).ERROR_INDEX);
END LOOP;
END insert_history;

PROCEDURE update_employee
IS
BEGIN
-- Use Oracle10g INDICES OF to use what may now be
-- a sparsely-populated employee_ids collection.
FORALL indx IN INDICES OF employee_ids SAVE EXCEPTIONS
UPDATE employee
SET salary = newsal_in
,hire_date = hire_dates (indx)
WHERE employee_id = employee_ids (indx);
EXCEPTION
WHEN bulk_errors
THEN
FOR indx IN 1 .. SQL%BULK_EXCEPTIONS.COUNT
LOOP
-- Now I simply log the error...
log_error
( 'Unable to update salary for employee '
|| employee_ids
(SQL%BULK_EXCEPTIONS (indx).ERROR_INDEX)
,SQL%BULK_EXCEPTIONS (indx).ERROR_CODE
);
END LOOP;
END update_employee;
BEGIN
OPEN employees_cur;

LOOP
-- Retrieve and process 100 rows at a time.
FETCH employees_cur
BULK COLLECT INTO
employee_ids
,salaries
,hire_dates
LIMIT 100;

EXIT WHEN employee_ids.COUNT = 0;

insert_history;
update_employee;
END LOOP;

END upd_for_dept;
/
This example should drive home the need for each and every PL/SQL programmer to learn the syntax of BULK COLLECT and FORALL, especially all of their nuances (from SAVE EXCEPTIONS and the LIMIT clause to INDICES OF and VALUES OF).
I consider these constructs to be among the most important elements of the PL/SQL language for programming in the 21st century. Study them and apply them throughout your code base!
Script demonstrating optimization of cursor FOR loop Author: Bryn Llewellyn
create or replace procedure putline ( approach in varchar2 , ol pls_integer) is
t0 integer; t1 integer;
cursor cur is select * from all_source;
one_row cur%rowtype;
type t is table of cur%rowtype index by pls_integer; many_rows t;
begin
t0 := Dbms_Utility.Get_Cpu_Time();


case approach
when 'implicit for loop' then
for j in cur loop
null;
end loop;


when 'explicit open, fetch, close' then
open cur;
loop
fetch cur into one_row;
exit when cur%NotFound;
end loop;
close cur;


when 'bulk fetch' then
open cur;
loop
fetch cur bulk collect into many_rows limit 100;
exit when many_rows.Count() < 1;
end loop;
close cur;


end case;
t1 := Dbms_Utility.Get_Cpu_Time();
Dbms_Output.Put_Line ('Timing for ' || approach ||
' with opt level ' || TO_CHAR (ol) || ' = ' || TO_CHAR (t1-t0 ));
end putline;
/
SET FEEDBACK OFF
SET SERVEROUTPUT ON
--------------------------------------------------------------------------------
-- Time it at optimize levels 1 and 2
--
-- Level 1


alter procedure putline compile plsql_optimize_level=1
/
call putline( 'implicit for loop' , 1) -- 2073
/
call putline( 'explicit open, fetch, close' , 1) -- 2063
/
call putline( 'bulk fetch' , 1) -- 252
/


--------------------------------------------------------------------------------
-- Level 2


alter procedure putline compile plsql_optimize_level=2
/
call putline( 'implicit for loop' , 2) -- 263 <<== NOTE THE CHANGE !
/
call putline( 'explicit open, fetch, close' , 2) -- as for level 1
/
call putline( 'bulk fetch' , 2) -- as for level 1
/

No comments:

Post a Comment