Different Normal Forms (1st, 2nd, 3rd, BCNF, DKNF)
NORMALIZATION AND ITS TYPES (1NF,2NF,3NF,BCNF,4NF AND DKNF)
Normalization is the process of organizing data in a relational database to reduce redundancy and dependency, resulting in a well-structured database schema. There are several normal forms, each addressing specific types of anomalies that can occur in a database.
TYPES OF ANOMALIES
Insertion Anomaly: If a tuple is inserted in referencing relation and referencing attribute value is not present in referenced attribute, it will not allow insertion in referencing relation.
If we try to insert a record into the COURSE table with SNO = 5, it will not be allowed because there is no corresponding SNO = 5 in the STUDENT table.
STUDENT TABLE
SID |
SNAME |
SPHONE |
SCOUNTRY |
SAGE |
1 |
Ram |
234567 |
Nepal |
20 |
2 |
Shyam |
234516 |
Nepal |
29 |
3 |
Sujit |
987654 |
Nepal |
28 |
4 |
Suresh |
456789 |
Nepal |
20 |
COURSE TABLE
SID |
COURSENO |
COURSENAME |
1 |
C1 |
DBMS |
2 |
C2 |
OS |
1 |
C2 |
OS |
Deletion and Updation Anomaly: If a tuple is deleted or updated from referenced relation and the referenced attribute value is used by referencing attribute in referencing relation, it will not allow deleting the tuple from referenced relation.
Example: If we want to update a record from COURSE with SNO =1, We have to update it in both rows of the table. If we try to delete a record from the STUDENT table with SNO = 1, it will not be allowed because there are corresponding records in the COURSE table referencing SO = 1. Deleting the record would violate the foreign key constraint, which ensures data consistency between the two tables.
NEED OF NORMALIZATION
Reducing Data Redundancy
Normalization helps to minimize duplicate data by organizing the database into tables with clear relationships. This not only saves storage space but also makes the database more efficient to manage.
Reducing Update Anomalies
Normalization minimizes update anomalies, which occur when data inconsistencies arise due to partial updates.
Preventing Deletion and Update Anomalies
Normalization helps prevent deletion anomalies, which occur when deleting data leads to unintended loss of other valuable data.
TYPES OF NORMALIZATION
First Normal Form (1NF):
A relation is in 1NF if it contains only atomic (indivisible) values, and each column has a unique name. Table should not contain multivalued attributes.
Example:
S_Id |
S_Name |
Course |
1 |
Arun |
C/C++ |
2 |
Bikash |
Python |
3 |
Chetan |
DBMS/C |
To convert to 1NF, we would break the Courses column into individual rows:
S_Id |
S_Name |
Course |
1 |
Arun |
C |
1 |
Arun |
C++ |
2 |
Bikash |
Python |
3 |
Chetan |
DBMS |
3 |
Chetan |
C |
Second Normal Form (2NF):
A relation is in 2NF if it is in 1NF and all non-prime attributes are fully functionally dependent on the primary key (no partial dependency on any part of the composite key if the primary key is composite or candidate key).
Table should not contain any partial dependency.
Non-Prime Attribute: Attributes that do not participate in formation of Candidate Key .
Prime Attribute: Attributes that participate in formation of Candidate Key .
Example:
Student_ID |
Course_ID |
Course_Fee |
1 |
110 |
2000 |
2 |
112 |
5000 |
3 |
105 |
2500 |
{Student_ID and Course_ID->Course_Fee}
Prime Attribute= Student_ID and Course_ID
Non Prime Attribute=Course_Fee
As attribute Course_Fee only depend on Course_ID and not on Student_ID , it doesnot follow 2nf rule as a non-prime attribute only depends on one prime attribute.
To convert to 2NF, we break above table into two tables:
Student_ID |
Course_ID |
1 |
110 |
2 |
112 |
3 |
105 |
Course_ID |
Course_Fee |
110 |
2000 |
112 |
5000 |
105 |
2500 |
Now, Non-prime attribute Course_Fee is fully dependent on the Course_ID, hence in 2NF.
Third Normal Form (3NF):
A relation is in 3NF if it is in 2NF, and there are no transitive dependencies or no non-prime attribute depends on another non-prime attribute.
Any non-prime attribute should only depend on prime attribute.
Example:
Here,
Prime Attribute=Student_ID
Non-Prime Attribute=State, City
In this table ,
Student_ID->City and City->State
Student_ID-> State
Here Student_ID is indirectly dependent on City , also City is dependent on State and both are Non-Prime attributes .
Also, City
is dependent on State
, which creates a transitive dependency because City
is indirectly dependent on Student_ID
through State
.
Student_ID |
State |
City |
1 |
CA |
Birganj |
2 |
SA |
Pokhara |
3 |
TA |
Kathmandu |
To convert to 3NF, we would break above table:
State Table
State |
City |
CA |
Birganj |
SA |
Pokhara |
TA |
Kathmandu |
Student Table
Student_ID |
State |
1 |
CA |
2 |
SA |
3 |
TA |
Now, both tables are in 3NF:
- In the States Table,
City
is directly dependent onState
. - In the Students Table,
State
is directly dependent onStudent_ID
.
Boyce-Codd Normal Form (BCNF):
Boyce-Codd Normal Form (BCNF) is an advanced version of the Third Normal Form (3NF). A table is in BCNF if it is in 3NF and, for every functional dependency (A → B), A is a superkey/key. A superkey/key is a set of one or more columns that uniquely identify a row in a table.
Example of BCNF:
Consider a table that records which course a student is enrolled in and who teaches that course:
Student_ID |
Course |
Instructor |
1 |
C |
P |
2 |
C++ |
B |
3 |
DBMS |
A |
Step 1: First Normal Form (1NF)
The table is already in 1NF because all values are atomic and there are no repeating groups.
Step 2: Second Normal Form (2NF)
The table is also in 2NF. Here, StudentID
and Course
together can be the composite key. There are no partial dependencies as all non-key attributes depend on the entire composite key.
Step 3: Third Normal Form (3NF)
The table is in 3NF as well because there are no transitive dependencies; all non-key attributes (Instructor) depend directly on the composite key (StudentID, Course
).
However, this table is not in BCNF. Here's why:
Identifying the problem:
The functional dependency Course → Instructor, here Course alone is not a Super Key in the above table.
Decomposition to BCNF:
To convert the table to BCNF, we need to decompose it so that every determinant is a superkey. We split the table into two:
Course Table
Course |
Instructor |
C |
P |
C++ |
B |
DBMS |
A |
Enrollment Table
Student_ID |
Course |
1 |
C |
2 |
C++ |
3 |
DBMS |
Now, let's check if these tables meet the BCNF criteria:
- In the Courses Table,
Course
is a key, and it determinesInstructor
(Course → Instructor
). - In the Enrollments Table,
StudentID is a key and it
uniquely identifies each row, and there are no partial or transitive dependencies.
Fourth Normal Form( 4NF):
For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:
- It should be in the Boyce-Codd Normal Form.
- And, the table should not have any Multi-valued Dependency.
A table is said to have multi-valued dependency, if the following conditions are true,
- For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may have multi-valued dependency.
- Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
- And, for a relation
R(A,B,C)
, if there is a multi-valued dependency between, A and B, then B and C should be independent of each other.
SID |
COURSE |
HOBBY |
1 |
C1 |
Playing |
2 |
C2 |
Reading |
1 |
C2 |
Dancing |
Here a student opts multiple course and has multiple hobbies, and the course and hobbies are independent of each other, hence we have an MVD.
To solve this issue, we can decompose the above table in two table.
SID |
COURSE |
1 |
C1 |
2 |
C2 |
1 |
C2 |
SID |
HOBBY |
1 |
Playing |
2 |
Reading |
1 |
Dancing |
Fifth Normal Form(5NF):
In database normalization, Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJ/NF), requires a table to be free of join dependencies that are not implied (joins are based on the table’s candidate keys) by candidate keys.
Any relation in order to be in the fifth normal form must satisfy the following conditions:
- It must be in Fourth Normal Form (4NF).
- It should have no join dependency and also the joining must be lossless.
In the fifth normal form the relation must be decomposed in as many sub-relations as possible so as to avoid any kind of redundancy and there must be no extra tuples generated when the sub-relations are combined together by using natural join.
Name |
Skills |
Job |
A |
Marketing |
GK001 |
B |
PR |
GK002 |
P |
Graphic Designing |
GK003 |
We can decompose the table given above into these three tables given below.
<Student_Skills>
Name |
Skills |
Tag |
Marketing |
Barry |
PR |
Paulo |
Graphic Designing |
<Student_Job>
Name |
Job |
Tag |
GK001 |
Barry |
GK002 |
Paulo |
GK002
|
<Job_Skills>
Skills |
Job |
Marketing |
GK001 |
PR |
GK002 |
Graphic Designing |
GK003 |
The join relation of the three relations given above is equal to the very original relation without any duplicate values.
Domain-Key Normal Form (DKNF):
Definition: A relation is in DKNF if it is in BCNF and all constraints are expressed in terms of domain constraints and key constraints.
A domain constraint specifies the permissible values for a given attribute, while a key constraint specifies the attributes that uniquely identify a row in a given table.
Example
Consider a table called Rich Person, the domain for Status consists of the values 'Millionaire' and 'Billionaire'; and the domain for Net Worth in Dollars consists of all integers greater than or equal to 1,000,000
The constraint dictates that a Millionaire will have a net worth of 1,000,000 to 999,999,999 inclusive, whilst a Billionaire will have a net worth of 1,000,000,000 or higher. This constraint is neither a domain constraint nor a key constraint.
The DKNF violation could be eliminated by removing the Status column. The wealthy person's status as a millionaire or billionaire is determined by their Net Worth in Dollars, as defined in the Status table, so no useful information is lost.
Rich_Person Table
PersonName |
Status |
Net Worth in Dollars |
A |
Millionaire |
424,553,621 |
B |
Billionaire |
8,533,228,793 |
C |
Billionaire |
5,829,562,999 |
D |
Millionaire |
792,545,211 |
Person Table
PersonName |
Net Worth in Dollars |
A |
424,553,621 |
B |
8,533,228,793 |
C |
5,829,562,999 |
D |
792,545,211 |
Status Table
Status |
Minimum |
Maximum |
Millionaire |
1,000,000 |
999,999,999 |
Billionaire |
1,000,000,000 |
999,999,999,999 |