Wednesday 14 September 2011

Best way to tell if row exists

Question:
What is the best approach to determining whether or not a row exists in a table? Currently I have implemented two different methods: 1. Do a SELECT count(*) INTO lv_tester FROM table. Then IF lv_tester > 0 THEN Some Logic.... ELSE RAISE error END IF; 2. Use a Cursor Loop and a local variable to see if a row was fetched inside the loop. I was curious if there is a better approach that I am missing.
Answer:
Your first approach might be fine, depending on your usage. The second approach is generally something to be avoided. Let's take a closer look.
One of the wonderful things about PL/SQL (and pretty much any robust programming language) is that there are so many different ways to implement the functionality you need.
One of the terrible things about PL/SQL (and pretty much any robust programming language) is that there are so many different ways to implement the functionality you need.
What is the chance, a developer might ponder, that she or has found the optimal implementation? It's enough to keep you up late at night, pondering the paradoxes of life as a programmer!
The general question you have asked is: "How do I best check to see if a row exists?"
To provide the most comprehensive answer, I would also need to hear the answers to these questions:
• What do you want to do after you get the answer to your question?
• Are you checking for existence of a row based on unique criteria (for example, a primary key) or criteria that might return multiple rows?
• Do you expect the check for existence to mostly fail or mostly succeed?
I will address these questions in my answer below.
What do you want to do after you get the answer to your question?
Suppose that if the row you are interested in does exist, you then need to fetch that record, change the values of the columns in that row, and more. In this case, you should simply execute a query or function (containing a query) to fetch the row of interest. Otherwise you will query the database to check for existence, and then query again to retrieve the row.
If you are going to use a query directly in your application code, you can rely on the fact that the implicit SELECT INTO will raise NO_DATA_FOUND if no rows are found, or TOO_MANY_ROWS if more than one is found.
My preference, however, is to hide the query inside a function that will generally trap the NO_DATA_FOUND and quietly return a record with null for every field of the return record. After calling this function, you can examine the contents of the row (now in the form of a PL/SQL record) to see if you fetched successfully (as in: is the record's primary key field null? This would be impossible unless no row was found). In this case, the answer to "Does the row exist?" comes "free" with the fetching of the row. Here is an example:

CREATE OR REPLACE FUNCTION one_employee (
employee_id_in IN employees.employee_id%TYPE
)
RETURN employees%ROWTYPE
IS
l_return employees%ROWTYPE;
BEGIN
SELECT *
INTO l_return
FROM employees
WHERE employee_id = employee_id_in;

RETURN l_return;
EXCEPTION
-- The record will be empty...
WHEN NO_DATA_FOUND THEN RETURN l_return;
END one_employee;
Now, if you only need to know if the row exists, but not actually manipulate the contents of that row, you are best off calling a function that returns a Boolean (TRUE = row found, FALSE = row not found), hiding the implementation details.
Multiple rows or single row for your criteria?
Now, suppose you only need to know if the row exists, but you don't need to manipulate its contents. In this case, we can then focus our attention on construction of a function to return a Boolean value: does the row exist? The way we implement the query insdide the function depends in part on whether or not you expect to find more than one row matching your criteria. If you are checking for existence of a primary key or some other unique value, it makes perfect sense to use SELECT COUNT(*), as you mentioned in your question. It will either return 0 or 1, and you can return the Boolean value based on that result. Here is an example:

CREATE OR REPLACE FUNCTION employee_exists (
employee_id_in IN employees.employee_id%TYPE
)
RETURN BOOLEAN
IS
l_count PLS_INTEGER;
BEGIN
SELECT COUNT ( * )
INTO l_count
FROM employees
WHERE employee_id = employee_id_in;

RETURN l_count = 1;
END employee_exists;
When this function defined, I can then see if a row exists for a department as follows:

BEGIN
IF employee_exists (l_emp_id)
THEN
...
END IF;
END;
/
Now I do not have to write a query each time I need to get this information; I simply call my pre-defined subprogram, as though it were a service offered to me by my PL/SQL development environment.
By taking this approach (encapsulating queries behind functions), I:
• reuse logic, rather than write the same code again and again. This leads to a big boost in my productivity and in the maintainability of my code base.
• make it easier to optimize, fix or evolve my lookup query, if needed. It is in just one place, so I make the changes just once and the impact is felt across the entire application.
If your query might return multiple rows then I suggest you avoid SELECT COUNT(*) (unnecessary overhead; you don't really care how many rows there are; you just want to know if there is at least one) and instead write a function that executes a SELECT INTO based on the appropriate WHERE clause. The following program tells us, for example, if there are any employees in a given department.

CREATE OR REPLACE FUNCTION employee_in_dept_exists (
department_id_in IN employees.department_id%TYPE
)
RETURN BOOLEAN
IS
l_count PLS_INTEGER;
BEGIN
SELECT 1
INTO l_count
FROM employees
WHERE department_id = department_id_in;

RETURN TRUE;
EXCEPTION
WHEN NO_DATA_FOUND THEN RETURN FALSE;
WHEN TOO_MANY_ROWS THEN RETURN TRUE;
END employee_in_dept_exists;
Notice that I select the literal value of 1 (one). It doesn't really matter what I return, since that information is not relevant here; the only issue is whether or not I return something.
I have avoided COUNT(*) against the table of interest since it would be inefficient, and also answers the wrong question. You should only use COUNT(*) if you need to know the number of rows that match a certain criteria (or if your criteria will identify at most one row, as shown earlier).
There is also no reason to use a cursor FOR loop, since I am only retrieving a single piece of information: a Boolean, TRUE or FALSE, does the row exist? You should only use a cursor FOR loop if you intend to iterate through all the rows in a result set.
No, in this case, the best approach to rely on that old standby and efficient querying form: the SELECT INTO statement (aka, the implicit one row query). SELECT INTO will return data if a single row is found (which means the row exists). SELECT INTO will raise NO_DATA_FOUND if no row is found (the row does not exist). And it will raise TOO_MANY_ROWS if more than one row is found (in which case the row or rows do exist).
Is this approach the most efficient? Hard to say. There are many ways to write a query to check for a single row. Certainly, there is overhead associated with raising and handling exceptions, which would make this approach less desirable if you know that in many cases, your function will return FALSE (no row found).
For example, Bryn Llewellyn, PL/SQL Product Manager, offered this alternative when he reviewed my answer:

CREATE OR REPLACE FUNCTION employee_in_dept_exists (
department_id_in IN employees.department_id%TYPE
)
RETURN BOOLEAN
IS
l_count PLS_INTEGER;
BEGIN
SELECT COUNT ( * )
INTO l_count
FROM DUAL
WHERE EXISTS ( SELECT 1
FROM employees
WHERE department_id = department_id_in );

RETURN l_count = 1;
END employee_in_dept_exists;
I am sure you will have your own ideas about how to implement your query. However you write that SELECT statement, though, be sure to put it inside a function, so that it can be easily re-used; the details of the implementation are hidden; and you can easily change that implementation as future releases of Oracle make new approaches possible.

1 comment:

  1. How do you find data for your new articles, which exact search networks or techniques do you mostly rely on?

    ReplyDelete