Database Normalization
Database normalization is the process of refining the data in accordance with a series of normal forms. This is done to reduce data redundancy and improve data integrity. This process divides large tables into small tables and links them using relationships.
The concept of normalization was invented by Edgar Codd and he introduced First Normal form before moving ahead with other types of normalization forms like Second and Third Normal forms.
Normalization forms
There are further enhancements to theory of normalization and it is still being developed. There is even 6th normal form but in any practical scenario, normalization achieves its best shape in 3rd Normal form.
Key terms
Column – Attribute
Row – Tuple
Table – Relation
Entity – Any real world object that makes sense
Step by Step Normalization Example
Let us look at a library table that maintains all the books they rent out in one single table
Now let us push this data from various normal forms and see how we can refine the data.
1NF - First Normal Form
The rules of the first normal form are
Each table cell should contain a single/atomic value
Every record in the table must be unique
Let us first convert the Books_Main_Table into 1NF
As per the 1NF rules, our Books Main Table looks good. Before we proceed with 2NF and 3NF, we need to understand key columns.
Key / non-key columns
Any column (or group of columns) in a table which can uniquely identify each row is known as key column.
Example
Phone number
Email id
Student roll number
Employee id
These are some columns that will always remain unique to every record inside the table.
Such columns are known as key columns inside the table. Any column apart from key columns is known as non-key column.
Primary key
A primary key is a single column value which uniquely identifies each record in a table. In RDBMS, primary key must satisfy below
Primary key must be unique
Primary key cannot be null
Every record will have primary key value
Composite Key
Sometimes its hard to define unique records with one single column. In such cases, we can have two or more columns that uniquely identify each record in a table. Such columns are known as composite key.
For example
Name + Address
First Name + DOB + Father Name
Now that we know about key / non-key columns, let us move to 2NF.
2NF - Second Normal Form
The rules of the second normal form are
Table must be in 1NF
Every non-key attribute must be fully dependent on key attributes
We see that our Books_Main_Table does not have any primary key, in such cases, we will have to introduce a new key column like Membership ID.
To make Books_Main_Table into 2NF, we need to see how columns are closely related:
Membership ID has a salutation, name, and address
Membership ID has books issued on their name
With this logic in mind, we will have to divide our Books_Main_Table into two tables
If you see the above tables, we have Membership ID in both tables but in Membership_Details_table, it is a primary key column and in Books_Issued_table, it is a non-key column.
Foreign Key
Till now we have seen Primary key and composite key. A foreign key refers to a primary key column of another table. This helps in connecting two tables (and defines a relation between two tables). A foreign key must satisfy below
Foreign key column name can be different than primary key column name
Unlike primary key, then need not be unique (see Books_Issued_Table above)
Foreign key column can be null even though primary key column cannot
Reason for Foreign key
When a user tries to insert a record into Books_Issued_Table and if there is no membership ID exists in Membership_Details_Table, it will be rejected. This way, we maintain data integrity in RDBMS.
If there is no record with Membership ID in the parent table, it will be rejected and database will throw an error.
3NF - Third Normal Form
The rules of the third normal form are
Data must be in 2NF
No transitive functional dependencies
What is a transitive dependency?
In simple terms, if changing a non-key column causes any other non-key column to change, then it's called a transitive dependency.
In our example, if we change the Full Name of the customer, it might change the Salutation
Final 3NF Tables
To move the Membership_Details_Table into 3NF, we need to further divide the table into below
We have divided the Membership_Details_Table into a new Salutation_table.
Assignment
If you see the Books_Issued_Table, it still does not have a key column. What do you think should be the key column for the Books_Issued_Table? Or do we need to introduce a new column?
Further Read