Wednesday 14 September 2011

Normalization De-Normalization

Normalization: It's the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminate redundant data (for example, storing the same data in more than one table) and ensure data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
1. Eliminate Repeating Groups - Make a separate table for each set of related attributes, and give each table a primary key.
2. Eliminate Redundant Data - If an attribute depends on only part of a multi-valued key, remove it to a separate table.
3. Eliminate Columns Not Dependent On Key - If attributes do not contribute to a description of the key, remove them to a separate table.
4. Isolate Independent Multiple Relationships - No table may contain two or more 1:n or n:m relationships that are not directly related.
5. Isolate Semantically Related Multiple Relationships - There may be practical constrains on information that justify separating logically related many-to-many relationships.
1st Normal Form (1NF)
Def: A table (relation) is in 1NF if
1. There are no duplicated rows in the table.
2. Each cell is single-valued (i.e., there are no repeating groups or arrays).
3. Entries in a column (attribute, field) are of the same kind.
Note: The order of the rows is immaterial; the order of the columns is immaterial.
Note: The requirement that there be no duplicated rows in the table means that the table has a key (although the key might be made up of more than one column—even, possibly, of all the columns).
2nd Normal Form (2NF)
Def: A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on all of the key.
Note: Since a partial dependency occurs when a non-key attribute is dependent on only a part of the (composite) key, the definition of 2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and if it has no partial dependencies."
3rd Normal Form (3NF)
Def: A table is in 3NF if it is in 2NF and if it has no transitive dependencies.
Boyce-Codd Normal Form (BCNF)
Def: A table is in BCNF if it is in 3NF and if every determinant is a candidate key.
4th Normal Form (4NF)
Def: A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies.
5th Normal Form (5NF)
Def: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and if every join dependency in the table is a consequence of the candidate keys of the table.
Domain-Key Normal Form (DKNF)
Def: A table is in DKNF if every constraint on the table is a logical consequence of the definition of keys and domains.

De-Normalization:
De-normalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access. You may apply De-normalization in the process of deriving a physical data model from a logical form.

Example for Normalization

The Daisy Hill Puppy farm currently has the following data items for puppies:
Puppy Number
+ Puppy Name
+ Kennel Code
+ Kennel Name
+ Kennel Location
+ { Trick ID
+ Trick Name
+ Trick Where Learned
+ Skill Level
}1...n
When the file was first set up, it was thought that no puppy could possibly learn more than 10 tricks before graduation, but one puppy, Snoopy, learned more than 40 tricks before his adoption by Charles Brown. This forced a restructuring of the entire file, but it also left most of the file filled with meaningless spaces since the average puppy learned only three of the forty possible tricks. In addition, more and more people are asking for puppies, which know specific tricks. Right now the entire file of 25,693 puppies needs to be searched to find a match.

1NF -- Eliminate Repeating Groups
Make a separate table for each set of related attribute, and give each table a primary key. How?

Split the table into two tables:
Puppy Table = Puppy Number
+ Puppy Name
+ Kennel Code
+ Kennel Name
+ Kennel Location

Trick Table = Puppy Number + Trick ID
+ Trick Name
+ Trick Where Learned
+ Skill Level
It would help to have all the trick data to be in a separate table. This would decrease the table size, since every puppy who knows trick 43 would have its name, Roll Over, too. If the last puppy to know trick 43 were run over by a car, his record will be removed and we will lose all knowledge of trick 43.

2NF -- Eliminate Redundant Data
If an attribute depends on only part of a multi-valued key, remove it into a separate table. How?

Split the Trick Table into two: Tricks and Puppy Tricks.
Tricks = Trick ID
+ Trick Name

Puppy Tricks = Puppy Number + Trick ID
+ Trick Where Learned
+ Skill Level

Puppy Table = Puppy Number
+ Puppy Name
+ Kennel Code
+ Kennel Name
+ Kennel Location
Now suppose no puppies from the K9-Kennel were currently stored in the database. There then remains no record of K9-Kennel's existence!

3NF -- Not Dependent On Key
If attributes do not contribute to a description of the key, remove them to a separate table. How?

Split Puppy Table into two tables: Puppies and Kennels
Tricks = Trick ID
+ Trick Name

Puppy Tricks = Puppy Number + Trick ID
+ Trick Where Learned
+ Skill Level
Puppies = Puppy Number
+ Puppy Name
+ Kennel Code

Kennels = Kennel Code
+ Kennel Name
+ Kennel Location
Suppose we want to add a new attribute to the Puppy-Trick table, "Costume." This way we can find puppies who can "sit-up- and-beg" while wearing a Ted Turner mask. The two need not relate, for example, a puppy may be able to walk upright while wearing a wet suit, but may not be able to do both at the same time. The new Puppy Tricks table will look like this:
Puppy Tricks = Puppy Number + Trick ID + Costume
+ Trick Where Learned
+ Skill Level

4NF -- Isolate Independent Multiple Relationships
No table may contain two or more 1:n or n:m relationships that are not directly related. Here Puppy Number specifies a well-defined set of Trick IDs and well-defined sets of costumes. Thus there is multi-dependency. How do we prevent anomalies in this case?

Split Puppy Tricks in two, Puppy Tricks and Puppy Costumes.
Puppy Tricks = Puppy Number + Trick ID
+ Trick Where Learned
+ Skill Level

Puppy Costumes = Puppy Number + Costume

Normalization and Normal Forms Example

Example Table "company", not normalized:
Company
Attribute
Name John Walker Jim Bim

Pers.ID 1 1

Dept. No. 1 2
Dept. Name Europe USA
Project No. 5, 7, 8 5, 8
Project Name Soap, Pasta, OliveOil Soap, OliveOil
Job in Project Analysis, Leader, Marketing Leader, Leader
Salary/h 13, 18, 15 18, 18

First NF (1NF): A relation cannot have repeating fields or groups (no field must have more than one value):
Employees
Attribute
Name John Walker Jim Bim

Pers. ID 1 1

Dept. No. 1 2
Dept. Name Europe USA
Projects
Attribute

Project No. 5 7 8
Project Name Soap Pasta OliveOil
Project Assignment
Attribute

Project No. 5 5 7 8 8

Pers. ID 1 1 1 1 1

Dept. No. 1 2 1 1 2
Job in Project Analysis Leader Leader Marketing Leader
Salary/h 13 18 18 15 18

Second NF (2NF): 1NF + every non-key field must be functionally dependent on all of the key.
Here: In the relation "Employees" the "Dept. Name" depends only on the "Dept. No." but not on "Pers. ID", thus not on all parts of the key:
Employees
Attribute
Name John Walker Jim Bim

Pers. ID 1 1

Dept. No. 1 2
Departments
Attribute

Dept. No. 1 2
Dept. Name Europe USA

The relations "Projects" and "Project Assignment" can remain as they are:
Projects
Attribute

Project No. 5 7 8
Project Name Soap Pasta OliveOil
Project Assignment
Attribute

Project No. 5 5 7 8 8

Pers. ID 1 1 1 1 1

Dept. No. 1 2 1 1 2
Job in Project Analysis Leader Leader Marketing Leader
Salary/h 13 18 18 15 18
________________________________________
Third NF (3NF): There must be no transitive dependency that is an attribute depends on one or more other non-key attributes.
Here: The relations "Employees", "Depends" and "Projects" can remain as they are.
But in "Projects Assignment" the salary depend only on the job, thus only indirectly on the key. This is a transitive dependency. We have to add a "Payment" relation.
Employees
Attribute
Name John Walker Jim Bim

Pers. ID 1 1

Dept. No. 1 2
Departments
Attribute

Dept. No. 1 2
Dept. Name Europe USA

Projects
Attribute

Project No. 5 7 8
Project Name Soap Pasta OliveOil
Payment
Attribute

Job Analysis Marketing Leader
Salary/h 13 15 18

Project Assignment
Attribute

Project No. 5 5 7 8 8

Pers. ID 1 1 1 1 1

Dept. No. 1 2 1 1 2
Job in Project Analysis Leader Leader Marketing Leader

Note: We could add an"Assignment No." in the "Project Assignment" relation as new key to avoid having only key attributes.
Now the DB is in 3NF: No Data-field modification should necessitate any other "follow-up" adjustment in any other data-field, because of transitive dependencies

No comments:

Post a Comment