Friday, 5 August 2011

Using Joins to Run Queries over Multiple Tables

Given that we've focused on a normalized database design in which information is stored in multiple tables, selecting from a single table is, well, limited. What make well-designed, relational databases interesting are the relationships—that is, the links between the tables. When information is selected from multiple tables, these links are called joins. Let's begin by looking at queries that link two tables.

Joining Two Tables
Consider the following query:

select employee.name, department.name
from employee, department
where employee.departmentID = department.departmentID;

You will see that we have specified two tables in the FROM clause instead of one. In this case, we want to retrieve employees' names and the names of the departments they work for. The results are as shown here:

+---------------+--------------------------+
| name | name |
+---------------+--------------------------+
| Ben Smith | Finance |
| Ajay Patel | Research and Development |
| Nora Edwards | Research and Development |
| Candy Burnett | Research and Development |
+---------------+--------------------------+
4 rows in set (0.42 sec)

How did we get these results? First of all, we selected columns that appeared in two tables. (You will notice that we used the dot notation to differentiate between the employee name and the department name, as discussed in Chapter 6.) To do this, we needed to include both of these tables in the FROM clause.

The most interesting thing about this query is the WHERE clause. If we run this query without the WHERE clause, as

select employee.name, department.name
from employee, department;

we get the following result:

+---------------+--------------------------+
| name | name |
+---------------+--------------------------+
| Ajay Patel | Finance |
| Nora Edwards | Finance |
| Candy Burnett | Finance |
| Ben Smith | Finance |
| Ajay Patel | Research and Development |
| Nora Edwards | Research and Development |
| Candy Burnett | Research and Development |
| Ben Smith | Research and Development |
| Ajay Patel | Human Resources |
| Nora Edwards | Human Resources |
| Candy Burnett | Human Resources |
| Ben Smith | Human Resources |
| Ajay Patel | Marketing |
| Nora Edwards | Marketing |
| Candy Burnett | Marketing |
| Ben Smith | Marketing |
+---------------+--------------------------+
16 rows in set (0.01 sec)

The first query, with the WHERE clause, shows employees listed with the correct department, whereas the second query shows all possible combinations of employees and departments, with no way of knowing which rows are correct and which are spurious! This result set, containing all possible rows from combining the two tables, is called the Cartesian product of the two tables.

The WHERE clause is clearly important in finding the result rows we want. When performing a join, we refer to the condition or set of conditions used to join tables together as the join condition. In this case, the condition we used was employee.departmentID = department.departmentID, which is the link between the tables based on the foreign keys in our original schema.

When you need to find information that spans more than one table, you need to use these links between tables to find the information you are seeking. Sometimes this means looking for a path from the information you have to the information you want. We'll come back to this idea in the next section.

One other point to note is that if you look at the previous result sets, both of the columns are headed "name" because this is what each column is called in the context of its own table. We could improve the readability of the results by using aliases, as shown here:

select employee.name as employeeName, department.name as departmentName
from employee, department
where employee.departmentID = department.departmentID;

This will give the following results:

+---------------+--------------------------+
| employeeName | departmentName |
+---------------+--------------------------+
| Ben Smith | Finance |
| Ajay Patel | Research and Development |
| Nora Edwards | Research and Development |
| Candy Burnett | Research and Development |
+---------------+--------------------------+
4 rows in set (0.55 sec)

The presentation of this result set is easier to understand than the previous ones.

Joining Multiple Tables
The principle behind joining more than two tables is the same.

Consider the situation of wanting to find out which department's employees have been assigned to work for the client called Telco Inc. How can we find this information?

We know the client name, and looking this up in the client table gives us the clientID. We can use this to find matching assignments in the assignment table and to see which employees have worked for the client. We get their employeeIDs from the assignment table and can then look these up in the employee table to find out the ids of the departments they work for. From this information, we can then finally go to the department table and look up the department name!

Having worked out this path across four tables, we need to write a query that reflects our logic. This is as follows:

select department.name
from client, assignment, employee, department
where client.name='Telco Inc'
and client.clientID = assignment.clientID
and assignment.employeeID = employee.employeeID
and employee.departmentID = department.departmentID;

These are the results of running this query:

+--------------------------+
| name |
+--------------------------+
| Research and Development |
+--------------------------+
1 row in set (0.00 sec)

Looking at the query we wrote, you can see that we needed to list all the tables in the path that we followed and then join conditions to make each link from table to table. We have a regular condition—client.name = 'Telco Inc'—and a series of join conditions. Notice that we had three join conditions to join four tables.

You can use this as a guideline to check whether you have all the join conditions you need. If you are joining n tables, in most cases, you will have a link between each pair of tables, and therefore have n-1 join conditions. The joins in this example are shown in Figure 7.1. You can very clearly see why four tables require three (n-1) joins.

Figure 7.1. Joining four tables requires three joins.


Joining a Table to Itself—Self Joins
As well as joining tables to other tables, we can join a table to itself. Why would we want to do this? Sometimes we are looking for relationships between rows in a table. Imagine that we want to know the names of employees who work in the same department as Nora Edwards. To do this, we need to find the departmentID Nora works for from the employee table and then look in the employee table for employees who work for that department.

We can do this as shown here:

select e2.name
from employee e1, employee e2
where e1.name = 'Nora Edwards'
and e1.departmentID = e2.departmentID;

You can see that, in this query, we have actually declared two different aliases for the employee table. Effectively, we are telling MySQL that we want to pretend we have two separate tables, e1 and e2, which just happen to contain the same data. Then, we can simply join them as we would any two other tables. We begin by finding Nora's row in e1 (where e1.name='Nora Edwards'). We can then look for rows in e2 that have the same departmentID as Nora (e1.departmentID = e2.departmentID).

This can take a little getting used to, but as long as you can pretend you are dealing with two different tables, you shouldn't have too much trouble.

These are the results of the preceding query:

+---------------+
| name |
+---------------+
| Ajay Patel |
| Nora Edwards |
| Candy Burnett |
+---------------+
3 rows in set (0.00 sec)

These are all the employees who work in the same department as Nora. You can see that Nora herself is included in the list. We can easily add a further condition to exclude her from the result set:

select e2.name
from employee e1, employee e2
where e1.name = 'Nora Edwards'
and e1.departmentID = e2.departmentID
and e2.name != 'Nora Edwards';

No comments:

Post a Comment