Friday, 26 August 2011

SQL or PLSQL when performing DML on lots of rows

Question:
It's my understanding that the primary benefit of using the FORALL statement rather than the FOR loop is that the number of context switches between SQL and PL/SQL is minimized. Is this correct?
I prefer whenever possible (and it almost always is) to incorporate the logic of the PL/SQL block into a sometimes long and complicated SQL statement. This approach involves virtually no context switches. Furthermore, as a relational database is designed to process sets of data, it seems to be a more "natural" approach.
Do you agree with this method or do you recommend that I return to using collections in most cases?
Answer:
Alan, I understand your question to be the following: You have some complex logic that you need to run as part of a process of updating, inserting, or deleting multiple rows of data. You have PL/SQL at your disposal and are aware that you could:
• Write a PL/SQL program that applies your complex logic as you build a collection, which you then could use in a FORALL bulk processing statement to achieve a high performance solution.
• Write a SQL-only solution, assuming that you can find a way to implement all of your complex logic without the assistance of PL/SQL.
Given those assumptions, you are correct that the primary benefit of a move to FORALL is to reduce context switches and thereby improve performance of the associated DML statement. FORALL (and its corollary in the world of querying, BULK COLLECT) is one of the most wonderful advances in PL/SQL (first introduced in Oracle8i Database).
Context switches are expensive; generally, the fewer of them your code must make, the better it will perfor. If you can meet your requirements entirely within the SQL language, then you can avoid context switches between PL/SQL and SQL altogether. In most cases, this approach should help you achieve higher performance.
A classic example of such a solution is an INSERT-SELECT like this:
BEGIN
INSERT INTO my_new_table
SELECT my_data
FROM my_old_table
WHERE my_data IS NOT NULL;
END;
You could implement this single statement with a BULK COLLECT from my_old_table into a collection, followed by a FORALL statement to push the data into my_new_table. And such an implementation will be much faster than using, say, a cursor FOR loop that fetches each row separately, and performs row-by-row DML as well. It is unlikely, however, to match the performance of an INSERT-SELECT.
Having said that, there is no abstract, black-and-white answer when it comes to performance. It is quite possible that, depending on the complexity of your SQL statement, a BULK COLLECT-FORALL combination could actually be faster than "straight" SQL.
As a general rule, however, I agree that it is worth seeing if you can implement your requirement first in SQL. If there are obstacles to doing so, then consider PL/SQL.
Some factors that might direct you to a PL/SQL solution include the following:
• If you need to execute the same non-trivial logic in multiple places, not all of which are within an SQL statement, then there is a strong argument for moving the logic to PL/SQL and then reusing that code. You can, for example, call a PL/SQL function inside a SQL statement and also from a host environment like Java.
• Sometimes the way you have to implement a requirement in SQL is much more complex and hard to write/understand, than when written in a procedural language like PL/SQL. If this is the case, I would go with the PL/SQL implementation unless the tradeoff in performance is unacceptable. In this situation, you might also consider writing a table function, which is a function that can be called in the FROM clause of a query. This function returns a collection whose contents can be queried as if it were a relational table.
More information about tables functions is available here; you can find examples of table functions here.
• One downside of SQL processing compared to PL/SQL is that you lose some granularity of control available in PL/SQL. With SQL, you are generally confronted with an "all or nothing" scenario: if there is an error with any single row processed by the SQL statement, the entire statement is rejected. This might be just the behavior you want. In other cases, you might want to record the error and continue processing data. This would only be possible by executing the SQL statement from within PL/SQL, either with FORALL SAVE EXCEPTIONS or with row-by-row DML processing. What is "natural" depends on the requirements you need to meet.
Two final notes:
• If your requirements are such that you must wring out every possible nanosecond's worth of performance from you code, you may decide that it is necessary to copy and paste the same logic in multiple SQL statement and PL/SQL subprograms. This is an extreme step to take, and should be done so only as an exception and last resort.
• Try to keep up with the latest advances in each successive Oracle version. New SQL features can sometimes render prior PL/SQL-centric solutions outdated. "Upsert" (try to update and if the row does not exist, then insert) is a good example. This scenario presented a good argument for using PL/SQL, until Oracle provided the MERGE command in Oracle9i Database.

No comments:

Post a Comment