top of page
DBAGenesis_png.png

Database Normalization

Database normalization is the process of refining the data in accordance with a series of normal forms. This is done to reduce data redundancy and improve data integrity. This process divides large tables into small tables and links them using relationships. 


The concept of normalization was invented by Edgar Codd and he introduced First Normal form before moving ahead with other types of normalization forms like Second and Third Normal forms.





Normalization forms


There are further enhancements to theory of normalization and it is still being developed. There is even 6th normal form but in any practical scenario, normalization achieves its best shape in 3rd Normal form.


Key terms

  • Column – Attribute

  • Row – Tuple

  • Table – Relation

  • Entity – Any real world object that makes sense



Step by Step Normalization Example


Let us look at a library table that maintains all the books they rent out in one single table

database normalization - step by step normalization

Now let us push this data from various normal forms and see how we can refine the data.



1NF - First Normal Form


The rules of the first normal form are

  • Each table cell should contain a single/atomic value

  • Every record in the table must be unique

Let us first convert the Books_Main_Table into 1NF

database normalization - rules of the 1nf

As per the 1NF rules, our Books Main Table looks good. Before we proceed with 2NF and 3NF, we need to understand key columns.


Key / non-key columns


Any column (or group of columns) in a table which can uniquely identify each row is known as key column.


Example

  • Phone number

  • Email id

  • Student roll number

  • Employee id

These are some columns that will always remain unique to every record inside the table.


Such columns are known as key columns inside the table. Any column apart from key columns is known as non-key column.


Primary key


A primary key is a single column value which uniquely identifies each record in a table. In RDBMS, primary key must satisfy below

  • Primary key must be unique

  • Primary key cannot be null

  • Every record will have primary key value


Composite Key


Sometimes its hard to define unique records with one single column. In such cases, we can have two or more columns that uniquely identify each record in a table. Such columns are known as composite key.


For example

  • Name + Address

  • First Name + DOB + Father Name

Now that we know about key / non-key columns, let us move to 2NF.



2NF - Second Normal Form


The rules of the second normal form are

  • Table must be in 1NF

  • Every non-key attribute must be fully dependent on key attributes

We see that our Books_Main_Table does not have any primary key, in such cases, we will have to introduce a new key column like Membership ID.


To make Books_Main_Table into 2NF, we need to see how columns are closely related:

  • Membership ID has a salutation, name, and address

  • Membership ID has books issued on their name

With this logic in mind, we will have to divide our Books_Main_Table into two tables

database normalization - 2NF
database nomalization - 2nf table

If you see the above tables, we have Membership ID in both tables but in Membership_Details_table, it is a primary key column and in Books_Issued_table, it is a non-key column.


Foreign Key


Till now we have seen Primary key and composite key. A foreign key refers to a primary key column of another table. This helps in connecting two tables (and defines a relation between two tables). A foreign key must satisfy below

  • Foreign key column name can be different than primary key column name

  • Unlike primary key, then need not be unique (see Books_Issued_Table above)

  • Foreign key column can be null even though primary key column cannot


database nomalization - 2nf table with primary key column and foreign key column

Reason for Foreign key


When a user tries to insert a record into Books_Issued_Table and if there is no membership ID exists in Membership_Details_Table, it will be rejected. This way, we maintain data integrity in RDBMS.


If there is no record with Membership ID in the parent table, it will be rejected and database will throw an error.



3NF - Third Normal Form


The rules of the third normal form are

  • Data must be in 2NF

  • No transitive functional dependencies


What is a transitive dependency?

In simple terms, if changing a non-key column causes any other non-key column to change, then it's called a transitive dependency.


In our example, if we change the Full Name of the customer, it might change the Salutation

database normalization - transitive dependency

Final 3NF Tables


To move the Membership_Details_Table into 3NF, we need to further divide the table into below

database nomalization - 3nf table
database normalization - 3nf table
database normalization - 3nf table

We have divided the Membership_Details_Table into a new Salutation_table.



Assignment


If you see the Books_Issued_Table, it still does not have a key column. What do you think should be the key column for the Books_Issued_Table? Or do we need to introduce a new column?



Further Read


Become a top notch dba with DBA Genesis
Become a DBA with DBA Genesis.png
bottom of page