Wednesday 14 September 2011

Aborting execution across the stack

Question:
How can I abort the execution of my entire call stack in PL/SQL? In other words, program A calls program B, etc., and way down deep in my call stack, a exception occurs that is so nasty that I want it to stop processing all the way up the stack, regardless of what the enclosing blocks' exception handlers normally would do.
Answer:
Satya, PL/SQL does not offer an "abort" feature. Instead, it provides flexible error-raising and handling mechanisms, so that you can decide as you write your code specifically what sort of actions should take place under any particular set of circumstances.
The bottom line is that if a subprogram N levels down in your call stack has an exception section that looks like this:
EXCEPTION
WHEN OTHERS THEN NULL;
then there is absolutely no way that you will be able to abort execution, "swallow" the exception, and return control to the calling block (level N – 1). That block will then continue processing, as if an error had never occurred.
One would hope that your application does not have within it "Who cares?" error handlers. And, of course, you shouldn't just hope for such things. Instead, you should put measures in places to ensure that such code is never written to begin with.
Having said that, I will answer your question in two parts: First, I'll offer two key best practices for writing exception sections in your programs; and second, I'll show how you can build some semblance of this abort capability into your application. Such a solution won't help at all with when others then null, but it could offer some "added value" in other circumstances that might arise in your code.
Critical best practices for error handling
In terms of error handling best practices, please check out my answer to the question, "Recommendations for Error Handling?" In the context of your question, I would like to emphasize the following points:
a. Only write a WHEN OTHERS clause in your exception section if you need to gather information about an unanticipated error, such as the local context information (arguments of the procedure, particular data values with which the program is working, the error back trace and execution call stack, and so on). When you have gathered that information, you should always propagate an exception out of the OTHERS handler; either re-raise the current exception with the RAISE; statement, or raise another exception. If your program swallows up the exception and allows the outer block to continue processing, you are likely to encounter very unpredictable behavior and many more problems in your application.
b. Create a standard, generic error-logging mechanism, a procedure that can be shared by all the developers in your group and that grabs all possible context information and saves it to the error log repository (which could be a database table, a file, or some other structure). Set as a rule that every single WHEN clause must call this error logging procedure whenever information about the error needs to be recorded. And if this procedure is called in a WHEN OTHERS clause, it should be followed by the raise of an exception, as described earlier.
You should also put in place a comprehensive and thorough code review process, based on a checklist of best practices and application standards. This code review should check, among other things, all the exception sections of programs. If a WHEN OTHERS, make sure that the code propagates an exception out of the handler.
Building in an abort capability
If you feel that you can put the above best practices into place in your organization, then you should not read the rest of this answer. From asking attendees at the many trainings I do each year, however, it does not seem that very many PL/SQL development teams have a thorough code review process in place. In addition, it is quite difficult to get developers to follow standards that have been defined for an application. With seemingly impossible deadlines looming, developers are hard-pressed to remember, much less make time for, guidelines on how they should write their code.
So do we just throw up our hands and give up any hope of implementing standards? Not at all! My feeling is that the way to get developers to follow standards is to use the carrot, rather than stick, approach. If a development manager tells a programmer to follow a certain standard "or else," that programmer is likely to offer resistance. Suppose, however, that manager tells the same programmer "Check out this program; it will take care of all error logging steps for you, writing information out to a table, calling the appropriate Oracle functions to obtain context information, commit within an autonomous transaction to make sure the data is saved."
Since the program should save developers lots of time and effort, it is more likely that they will use it —and in the process, merely as a barely-noticed consequence, they will be following the standard! (Just don't point that out; you'll ruin everything.)
I implement the prototype of such a procedure below; it is called errpkg.record_error. Let's see how it might be used. Consider the following exception section:
EXCEPTION
WHEN NO_DATA_FOUND
THEN
errpkg.record_error (SQLCODE
, 'Company with ID ' || l_id || ' not found.');
WHEN OTHERS
THEN
errpkg.record_error;
END;
The NO_DATA_FOUND handler records the error, and then allows the outer block to continue execution. Is this a problem? Not necessarily. The requirements for this program might actually say that a NO_DATA_FOUND error is not particularly serious. The fact that it happenered simply needs to be recorded, and then processing may continue.
The OTHERS handler also simply records the error and does not propagate an exception. Is this a problem? Absolutely! This is a violation of the coding standard, and should be caught in the code review process. But if there is no code review process, or it is sloppy, let's see how I can now implement my record_error procedure so that even if the programmer has violated the standard in this way and swallowed the exception, a lower level block can still achieve an "abort." That is, the programmer can in effect override the default behavior as coded (poorly) by the developer.
In the text below, I offer a simple logging procedure that emulates an abort-like behavior within PL/SQL exception sections under certain circumstances. Do not use this code "as is" in a production application; it is merely a prototype that demonstrates the basic technique and the different code elements required. I will start by describing my requirement:
Provide a procedure that developers can call in their error handling WHEN clauses to record error information to a log table. They can also at that time request an abort action, so that after logging the error, the exception is propagated out of the outermost block unhandled, thereby stopping the current process.
Here are the steps I took to implement this functionality:
Step 1. Create a an error log table.
This log records the error code, error message, call stack, error back trace (available only in Oracle Database 10g).
CREATE TABLE error_log (
err_code INTEGER,
err_message VARCHAR2(4000),
call_stack VARCHAR2(4000),
back_trace VARCHAR2(4000),
created_on DATE,
created_by VARCHAR2(30)
)
/
Step 2. Create an error logging package specification that includes bailout functionality.
CREATE OR REPLACE PACKAGE errpkg
IS
-- Errpkg raises this exception if the user requests an abort.
-- It should never be raised under other circumstances.
e_abort_failure EXCEPTION;
en_abort_failure PLS_INTEGER := -20999;
PRAGMA EXCEPTION_INIT (e_abort_failure, -20999);

-- Reset status to "not aborting".
PROCEDURE reset_status;

-- Record an error and indicate if this error should initiate
-- an abort.
PROCEDURE record_error (
err_code_in IN error_log.err_code%TYPE DEFAULT SQLCODE
,err_message_in IN error_log.err_message%TYPE DEFAULT NULL
,abort_in IN BOOLEAN DEFAULT FALSE
);
END errpkg;
/
In this package specification, I declare a special exception that will be raised when the programmer requests "abort" behavior. The reset_status procedure resets the package state to indicate that we are not currently aborting the process. The record_error procedure accepts the error code and message, plus a third argument to indicate whether or not this error should initiate an abort.
Step 3. Create the package body.
Now I will implement the abort functionality. Here is my package body; an explanation follows.
CREATE OR REPLACE PACKAGE BODY errpkg
IS
g_abort BOOLEAN DEFAULT FALSE;

PROCEDURE reset_status
IS
BEGIN
g_abort := FALSE;
END reset_status;

PROCEDURE record_error (
err_code_in IN error_log.err_code%TYPE DEFAULT SQLCODE
,err_message_in IN error_log.err_message%TYPE DEFAULT NULL
,abort_in IN BOOLEAN DEFAULT FALSE
)
IS
PROCEDURE insert_row
IS
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
INSERT INTO error_log
(err_code
,err_message
,call_stack
,back_trace
,created_on, created_by
)
VALUES (err_code_in
,SUBSTR (NVL (err_message_in
,DBMS_UTILITY.format_error_stack
)
,1
,4000
)
,SUBSTR (DBMS_UTILITY.format_call_stack, 1, 4000)
,SUBSTR (DBMS_UTILITY.format_error_backtrace, 1, 4000)
,SYSDATE, USER
);
COMMIT;
EXCEPTION
WHEN OTHERS
THEN
-- Must rollback on exit from autonomous transaction.
-- Display generic message to indicate problem.
ROLLBACK;
DBMS_OUTPUT.put_line ('Unable to write to error log!');
DBMS_OUTPUT.put_line ('Error:');
DBMS_OUTPUT.put_line (DBMS_UTILITY.format_error_stack);
END insert_row;
BEGIN
IF g_abort
THEN
RAISE e_abort_failure;
END IF;

insert_row;

IF abort_in
THEN
g_abort := abort_in;
RAISE_APPLICATION_ERROR (en_abort_failure
, 'Abort exception invoked by developer!');
END IF;
END record_error;
END errpkg;
/
As I mentioned earlier, this is a very simple demonstration prototype. Here are the steps taken in my package body to implement the needed functionality:
1. I declare a package level variable to hold the abort status.
2. The reset_status procedure sets this value to FALSE—that is, "not aborting."
3. In the executable section of record_error, the very first thing I do is check to see if I am currently in abort mode. If so, I do not log the error, I simply raise the special abort exception. Notice that by taking this approach, I do not clog up the error log with "false" errors that are being handled solely because the abort mode is causing immediate propagation of the original exception.
4. If I am not currently aborting the process, I log the error with a call to the insert_row procedure. This local module is defined as an autonomous transaction so that I can commit the new log information immediately, and preserve it even if the exception goes unhandled and triggers a rollback. If an error occurs here, I will rollback and then display a message to the screen. Again, in a production application, you will need to do something more elaborate here, perhaps write a message out to a trace file.
5. Finally, check the status of the abort parameter. If set to TRUE, I turn on the abort mode in the package and then raise the abort exception.
So you can see that once I have logged an "abort error" and set the flag, all subsequent calls to record_error with simply propagate the abort exception, without performing any sort of logging.
Let's take at how I can use this package to trigger an abort (see abort.tst).
I create a procedure that raises an exception. In my handler, I log the error and then set the abort status to TRUE:
CREATE OR REPLACE PROCEDURE proc1
IS
BEGIN
RAISE NO_DATA_FOUND;
EXCEPTION
WHEN NO_DATA_FOUND
THEN
errpkg.record_error (SQLCODE
,'No company found!'
,abort_in => TRUE
);

END proc1;
/
Then I call proc1 from an outer block that includes a WHEN OTHERS exception section:
BEGIN
errpkg.reset_status;
proc1;

EXCEPTION
WHEN OTHERS
THEN
errpkg.record_error (SQLCODE, 'Error from proc1!');
END;
/
Notice that I pass FALSE for abort status; this will be ignored, however, because an inner block's exception section has already turned on "abort mode." Consequently, when the top-level anonymous block attempts to record the error (and not propagate an exception), the errpkg.record_error procedure interrupts the process and simply propagates out the exception.
The result is that the abort exception goes unhandled and displays this error stack:
ORA-20999:
ORA-06512: at "QNXO_DEVELOPMENT.ERRPKG", line 55
ORA-06512: at line 8
ORA-20999: Abort exception invoked by developer!
and (assuming my error log was empty before running the anonymous block) the contents of my error log contains just this single row of data:
ERR_CODE ERR_MESSAGE CREATED_O CREATED_BY
--------- ----------------- --------- ------------
100 No company found! 20-JUL-05 SCOTT
Remember: this technique will only work if you make sure that every exception handler calls errpkg.record_error before any other code is executed in that handler.
You must also take care to reset the abort status after handling the error. Otherwise, the package will propagate the abort exception incorrectly.
So that is how you can emulate something that is akin to an "abort process" inside a PL/SQL execution stack (with all the constraints mentioned), but the best solution to this problem is to make sure that all WHEN OTHERS handlers propagate an exception to the outer block, so that exceptions are never swallowed. In this way, your special "abort" exception, as well as any other unexpected exception, will propagate unhandled to the outermost block and terminate execution.

No comments:

Post a Comment