Friday 26 August 2011

Should I use SELECT INTO when I need to query a single row of data, or declare an explicit cursor and OPEN-FETCH-CLOSE

Question:
What are advantages and disadvantages of a SELECT INTO statement?
Answer:
I will start with my advice, then review the basics of SELECT INTOs (implicit cursors), then explain my advice in more detail. I also encourage you to check out the answer to the question "What's the best way to write a cursor in PL/SQL, anyway?" for additional insights on this topic.
My recommendations:
1. Use implicit cursors rather than explicit cursors whenever possible. They are generally more efficient.
2. Encapsulate your implicit cursors inside a function that returns the row of data identified by the cursor. Avoid repetition of the same SQL statement in your application.
3. Use BULK COLLECT whenever you need to fetch multiple rows of data.
Review of basics...
The term "SELECT INTO statement" refers to a single row, implicit query, as in:

DECLARE
l_id employees.employee_id%TYPE;
BEGIN
SELECT employee_id
INTO l_id
FROM employees
WHERE last_name = 'FEUERSTEIN';

...more code here
END;
/
The key features of SELECT INTO of which you should be aware are:
• As an implicit query, Oracle opens the cursor, parses the statement, fetches the single row, and closes the cursor.
• If the result set of the query is empty, Oracle raises NO_DATA_FOUND.
• If the result set of the query consists of more than one row, Oracle raises TOO_MANY_ROWS.
You can also use a variation of this statement to fetch multiple rows into one or more collections. Here is an example:

DECLARE
TYPE employee_ids_aat IS TABLE OF employees.employee_id%TYPE;
l_employee_ids employee_ids_aat;
BEGIN
SELECT employee_id
BULK COLLECT INTO l_employee_ids
FROM employees
WHERE last_name LIKE '%E%';
END;
/
The SELECT BULK COLLECT INTO statement also relies on an implicit query. It does not, however, raise NO_DATA_FOUND if no rows are queried. Instead, the COUNT of the collection populated by the query will return 0.
In contrast to the implicit query, I could also use an explicit cursor, which means I declare the cursor explicitly and then open, fetch from, and close the cursor myself. Here is an example:

DECLARE
l_id employees.employee_id%TYPE;

CURSOR feuerstein_cur
IS
SELECT employee_id
FROM employees
WHERE last_name = 'FEUERSTEIN';
BEGIN
OPEN feuerstein_cur;

FETCH feuerstein_cur INTO l_id;

CLOSE feuerstein_cur;
END;
/
So that's a quick recap of implicit and explicit cursors.
Explanation of recommendations
A long, long time ago in an Oracle galaxy far, far away, "gurus" and "experts" instructed programmers to avoid SELECT INTOs. We (yes, I was one of them) argued as follows:
"Implicit queries were always slower than their explicit counterparts. The explanation? Execution of implicit cursors must conform to the ANSI standard, which means Oracle fetches once to retrieve the row of interest, and a second time to see if TOO_MANY_ROWS should be raised. With the explicit cursor, you can fetch just once yourself. Therefore, explicits are more efficient."
A fine, abstract argument and at one point it was undoubtedly valid.
As I say, though, that was a long time ago. Oracle has since (as in, since Oracle 7.3) optimized SELECT INTO statements and they are always faster than explicit single row fetches. (to drive this point home, run the script – provided by Bryn Llewellyn, PL/SQL Product Manager, at the end of this answer) So I would say: the SELECT INTO should be your first choice when fetching single rows of data.
There is, however, one down-side to using the implicit cursor: the SELECT statement is inserted directly into the executable section of your block. If you need to execute the same query in another block, you will need to cut and paste the query. Maybe the query is a simple one and making such a copy is not a big deal. Generally, however, the presence of redundant SQL statements results in an application that is much harder to optimize and maintain.
I suggest strongly that whatever type of cursor you use, you should hide that cursor inside a function whose purpose in life is nothing more nor less than to return the data from that cursor. Here is an example

CREATE OR REPLACE FUNCTION id_from_name (
NAME_IN IN employees.last_name%TYPE )
RETURN employees.employee_id%TYPE
IS
l_id employees.employee_id%TYPE;
BEGIN
SELECT employee_id
INTO l_id
FROM employees
WHERE last_name = NAME_IN;

RETURN l_id;
EXCEPTION
WHEN NO_DATA_FOUND
THEN
RETURN NULL;
WHEN TOO_MANY_ROWS
THEN
log_error ( 'Data integrity failure for employee name "'
|| NAME_IN
|| '".'
);
RAISE;
END id_from_name;
/
I return a NULL value to indicate that a row was not found, and I log the TOO_MANY_ROWS error along with the employee name that caused the problem. With this function in place, my earlier implicit cursor example now becomes nothing more than:

DECLARE
l_id employees.employee_id%TYPE;
BEGIN
l_id := id_from_name ('FEUERSTEIN');

IF l_id IS NOT NULL
THEN
...more code here
END IF;
END;
/
And I can, of course, call this function in any block of code that needs to look up an employee ID from the last name.
Of course, it can be annoying to have to write these functions for all your tables and queries against those tables. Generating this code is much more efficient and less error-prone. Oracle Designer generates such queries as part of its table API. You can also generate very robust table API packages with Qnxo (a code generation and reuse tool that I wrote, available at www.qnxo.com).
And, now, back to BULK COLLECT for a final point: one disadvantage to using BULK COLLECT with an implicit query is that you must then fetch all rows of the result set into your collection. If the result set is very large, this action could consume a large amount of memory. So Oracle offers a LIMIT clause for BULK COLLECT that allows you to fetch multiple, but not necessarily all, rows, thereby controlling the amount of memory consumed. Here is an example of using the LIMIT clause:

DECLARE
CURSOR emp_cur IS
SELECT * FROM employees;

TYPE employees_aat IS TABLE OF employees%ROWTYPE
INDEX BY BINARY_INTEGER;

l_employees employees_aat;
BEGIN
OPEN emp_cur;

LOOP
FETCH emp_cur
BULK COLLECT INTO l_employees LIMIT 100;
EXIT WHEN l_employees.COUNT = 0;

FOR indx IN 1 .. l_employees.COUNT
LOOP
process_data ( l_employees ( l_row ));
END LOOP;
END LOOP;

CLOSE emp_cur;
END;
/
Here's the catch: you cannot use the LIMIT clause with a SELECT BULK COLLECT INTO statement. So if you want to use BULK COLLECT, but need to fetch less than the full result set with each iteration, you will need to switch to an explicit cursor or cursor variable.
Peformance comparison script by Bryn Llewellyn

CONNECT d/p
EXECUTE u
CONNECT Usr/p

create table Things(n number primary key)
/
variable c number
begin :c := 1000000; end;
/
begin
for j in 1..:c loop
insert into Things(n) values(j);
end loop;
commit;
end;
/

create or replace procedure p(c in number) is
Expected_Checksum constant number := 500000500000;
Cursor Cur(x in number) is
select n from Things where n = x;
n number;
Checksum number := 0;
t0 number; t1 number;
begin
Checksum := 0;
t0 := DBMS_Utility.Get_Cpu_Time();
for j in 1..c loop

$if $$OFC $then
open Cur(j);
fetch Cur into p.n;
close Cur;

$else
select Things.n into p.n from Things where Things.n = j;

$end

Checksum := Checksum + n;
end loop;
t1 := DBMS_Utility.Get_Cpu_Time();
if Checksum is null or Checksum <> 500000500000 then
Raise_Application_Error(-20000, 'Bad Checksum: '||Checksum);
end if;

$if $$OFC $then
DBMS_Output.Put_Line('O-F-C: CPU: '||(t1-t0));
$else
DBMS_Output.Put_Line('Select into: CPU: '||(t1-t0));
$end
end p;
/

alter procedure p compile PLSQL_CCFlags = 'OFC:true'
/
begin p(:c); end;
/
alter procedure p compile PLSQL_CCFlags = 'OFC:false'
/
begin p(:c); end;
/

/* Typical results:
O-F-C: CPU: 3379
Select into: CPU: 2745
*/

No comments:

Post a Comment