Databases: Removing Duplicates with DISTINCT

Friday, 5 August 2011

Removing Duplicates with DISTINCT

You can use the keyword DISTINCT in your queries to specify that you do not want to see duplicate results. For example, consider the following query:

select job
from employee;

This will return the following data:

+-----------------------+
| job |
+-----------------------+
| Programmer |
| Programmer |
| Systems Administrator |
| DBA |
+-----------------------+
4 rows in set (0.01 sec)

Note that the data Programmer appears twice. This is because this value occurs in two rows. This query has simply returned the complete list of values in the job column of this table.

Now, consider this query:

select distinct job
from employee;

This will return the following rows:

+-----------------------+
| job |
+-----------------------+
| Programmer |
| Systems Administrator |
| DBA |
+-----------------------+
3 rows in set (0.04 sec)

Here, the duplicates have been removed.

In this case, the difference doesn't seem like that big of a deal—sure, the second set of results is a little neater, but it doesn't really improve things much. It would be a little more important for a big table with a lot of repetition, but it would still be presenting accurate information.

On the other hand, consider this:

mysql> select count(job) from employee;
+------------+
| count(job) |
+------------+
| 4 |
+------------+
1 row in set (0.01 sec)

This query tells us there are four values in the job column. This is kind of misleading. It certainly doesn't tell us that there are four different values in the job column because we can see by inspection of the data that there are only three.

It is relatively easy to type the previous query by mistake when what you actually meant was this:

select count(distinct job) from employee;

This will give you the following result:

+---------------------+
| count(distinct job) |
+---------------------+
| 3 |
+---------------------+
1 row in set (0.05 sec)

This tells us how many different values are in the job column, a more useful piece of information.

Databases

Pages

Friday, 5 August 2011

Removing Duplicates with DISTINCT

No comments:

Post a Comment