Database normalization is the process of refining the data in accordance with a series of normal forms. This is done to reduce data redundancy and improve data integrity. This process divides large tables into small tables and links them using relationships.
The concept of normalization was invented by Edgar Codd and he introduced First Normal form before moving ahead with Second and Third Normal forms.
- 1st Normal Form
- 2nd Normal Form
- 3rd Normal Form
- Boyce-Codd Normal Form
- 4th Normal Form
- 5th Normal Form
- 6th Normal Form
There are further enhancements to theory of normalization and it is still being developed. There is even 6th normal form but in any practical scenario, normalization achieves its best shape in 3rd Normal form.
- Column – Attribute
- Row – Tuple
- Table – Relation
- Entity – Any real world object that makes sense
Let us look at a library table that maintains all the books they rent out in one single table:
Now let us push this data from various normal forms and see how we can refine the data.
1NF (First Normal Form)
The rules of the first normal form are:
- Each table cell should contain a single/atomic value
- Every record in the table must be unique
Let us first convert the Books_Main_Table into 1NF
As per the 1NF rules, our Books Main Table looks good. Before we proceed with 2NF and 3NF, we need to understand key columns.
Key / non-key columns
Any column (or group of columns) in a table which can uniquely identify each row is known as key column. For example:
- Phone number
- Email id
- Student roll number
- Employee id
These are some columns which will always remain unique to every record inside the table.
Such columns are known as key columns inside the table. Any column apart from key columns is known as non-key column.
A primary key is a single column value which uniquely identifies each record in a table. In RDBMS, primary key must satisfy below:
- Primary key must be unique
- Primary key cannot be null
- Every record will have primary key value
Sometimes its hard to define unique records with one single column. In such cases, we can have two or more columns that uniquely identify each record in a table. Such columns are known as composite key. For example:
- Name + Address
- First Name + DOB + Father Name
Now that we know about key / non-key columns, let us move to 2NF.
2NF (Second Normal Form)
The rules of the second normal form are:
- Table must be in 1NF
- Every non-key attribute must be fully dependent on key attributes
We see that our Books_Main_Table does not have any primary key, in such cases, we will have to introduce a new key column like Membership ID.
To make Books_Main_Table into 2NF, we need to see how columns are closely related:
- Membership ID has a salutation, name and address
- Membership ID has books issued on their name
With this logic in mind, we will have to divide our Books_Main_Table into two table:
If you see the above tables, we have Membership ID in both tables but in Membership_Details_table, it is a primary key column and in Books_Issued_table, it is a non-key column.
Till now we have seen Primary key and composite key. A foreign key refers to a primary key column of another table. This helps in connecting two tables (and defines a relation between two tables). A foreign key must satisfy below:
- Foreign key column name can be different than primary key column name
- Unlike primary key, then need not be unique (see Books_Issued_Table above)
- Foreign key column can be null even though primary key column cannot
Reason for Foreign key
When a user tries to insert a record into Books_Issued_Table and if there is no membership ID exists in Membership_Details_Table, it will be rejected. This way, we maintain data integrity in RDBMS.
If there is no record with Membership ID in the parent table, it will be rejected and database will throw an error.
3NF (Third Normal Form)
The rules of the third normal form are:
- Data must be in 2NF
- No transitive functional dependencies
What is transitive dependency?
In simple terms, if changing a non-key column causes any other non-key column to change, then its called as transitive dependency.
In our example, if we change Full Name of the customer, it might change Salutation
Final 3NF Tables
To move the Membership_Details_Table into 3NF, we need to further divide the table into below:
We have divided the Membership_Details_Table into a new Salutation_table.
If you see the Books_Issued_Table, it still does not have a key column. What do you think should be the key column for the Books_Issued_Table? Or do we need to introduce a new column?