Friday, 5 August 2011

Writing subquery


A subquery is a query within a query—that is, a query in which we reuse the result in another query. They are sometimes called nested queries or subselects. Subqueries are new in MySQL 4.1. They are a feature that users have been requesting for a long time. Subqueries do not add new functionality, but queries are often more readable using subqueries, rather than a complex set of joins.

We have already looked at one kind of subquery without realizing it. The multi-table deletes and updates that we looked at in Chapter 5, "Inserting, Deleting, and Updating Data," are a type of specialized subquery.

In this chapter, we will look at subqueries in SELECT statements.

Two basic kinds of subqueries have been added to MySQL:

Derived table subqueries

Expression subqueries

Expression subqueries appear in the WHERE clause of a SELECT statement. These come in two further types:

Subqueries that return a single value or row

Subqueries that are used to test a Boolean expression

We'll look at an example of each of these in turn.

Using Derived Table Subqueries
Derived table subqueries allow us to list a query in the FROM clause of another query. This effectively allows us to create a temporary table and add it to the query. For example, consider the following simple query:



select employeeID, name from employee where job='Programmer';

It should be obvious that this will retrieve the names and ids of all the programmers. We can use this query within another for another useful result:



select programmer.name
from (select employeeID, name from employee where job='Programmer')
        as programmer,
        assignment
where programmer.employeeID = assignment.employeeID;

In this case, we have used the subquery (select employeeID, name from employee where job='Programmer') to make a derived table that contains only the rows employeeID and name, and we have aliased this table to call it 'programmer'. We can then query it as we would any other table. In this case, we use it to find out which programmers have worked on outside assignments to garner the following results:



+--------------+
| name         |
+--------------+
| Nora Edwards |
+--------------+
1 row in set (0.01 sec)

Using Single-Value Subqueries
As in the preceding section, we will start with a simple query:



select max(hours) from assignment;

This will retrieve a single value, representing the maximum number of hours an employee has worked on an assignment. We are using a MySQL function we have not yet mentioned: max(), which finds the greatest value in a particular column. We will revisit max() in Chapter 8, "Using MySQL Built-In Functions with SELECT." Using the result returned by this type of function is a common application of single-value subqueries.

As before, we can go on and use this query within another query.

Single-value subqueries return a single column value and are then typically used for comparison. For example, consider the following query:



select e.employeeID, e.name
from employee e, assignment a
where e.employeeID = a.employeeID
and a.hours = (select max(hours) from assignment);

Here, we are looking for what might be termed the company's hardest working employee: Who is the employee who has put in the greatest number of hours on a particular day on an assignment?

Here are the results of running this query:



+-------------+--------------+
| employeeID  | name         |
+-------------+--------------+
|        7513 | Nora Edwards |
+-------------+--------------+
1 row in set (0.42 sec)

We can also write subqueries that return a row, rather than a single value, although this is often of limited usefulness. We will not look at an example of this here.

Using Boolean Expression Subqueries
Boolean expression subqueries are used to check our query against some special functions that return a Boolean expression. These special functions are IN, EXISTS, and (grouped together) ALL, ANY, and SOME.

We can use the keyword IN to check against a set of values. Consider the following query:



select name
from employee
where employeeID not in
      (select employeeID
       from assignment);

This query has the same effect as the one we looked at using LEFT JOIN. It allows us to look for employees who are not in the set of employees who have worked on an outside assignment. The keyword IN lets us look for values in a set of values. We get the same result here as we did from our LEFT JOIN query:



+---------------+
| name          |
+---------------+
| Ajay Patel    |
| Candy Burnett |
| Ben Smith     |
+---------------+
3 rows in set (0.45 sec)

Interestingly enough, another use of IN is to just test against a listed a set of values, as shown here:



select name
from employee
where employeeID not in (6651, 1234);

The EXISTS keyword works in a slightly different fashion than the IN keyword. In queries using EXISTS, we actually use data from the outer query in the subquery. This is sometimes called a correlated subquery.

For example, consider the following query:



select e.name, e.employeeID
from employee e
where not exists
          (select *
           from assignment
           where employeeID = e.employeeID);

Here, we are looking for employees who have never worked on an outside assignment.

In the subquery, we are looking at rows in the assignment table and we are checking for rows where the assignment employeeID is the same as the employee.employeeID. The e.employeeID comes from the outer query. Here's what MySQL is actually doing: For each row in the employee table, we check the results of the subquery, and if there is no matching row (WHERE NOT EXISTS), we add the employee's details to the result set.

Although some users find this an easier syntax to understand, we can get the same result using a LEFT JOIN as we did before. It will also be more efficient and, therefore, faster to execute if written as a left join. This query yields exactly the same results:



+---------------+-------------+
| name          | employeeID  |
+---------------+-------------+
| Ajay Patel    |        6651 |
| Candy Burnett |        9006 |
| Ben Smith     |        9842 |
+---------------+-------------+
3 rows in set (0.00 sec)

The ALL, ANY, and SOME keywords are used to compare against a set of values returned from a subquery.

Suppose that Nora Edwards, who you may remember is our hardest working programmer, wants to establish that nobody works longer hours than the programmers. She comes up with the following query to establish this fact:



select e.name
from employee e, assignment a
where e.employeeID = a.employeeID
and a.hours > all
         (select a.hours
         from assignment a, employee e
         where e.employeeID = a.employeeID
         and e.job='Programmer');

The subquery finds the list of hours worked on assignments by programmers in the company. It then looks for any other employees who have worked on an assignment for longer than these programmers, using the check a.hours > ALL (the programmers' hours).

You will not be surprised to know that this query returns no rows, establishing that, in fact, nobody in this company works harder than the programmers.

No comments:

Post a Comment