Normalization

Abhinav Kumar gupta
4 min readNov 22, 2020

--

A technique of organizing the data into multiple related tables, to minimize Data Redundancy. data redundancy is nothing but a repetition of similar data at multiple tables and we have to reduce it not just because the repetition of similar data multiple times eats up extra space but it also leads to multiple other issues like insertion, deletion, and updation anomalies.

Insertion Anomaly

To insert redundant data for every new row is a data insertion anomaly.

Deletion Anomaly

loss of a related dataset when some other dataset is deleted.

normalization will solve this problem by breaking the existing table into two different tables.

Types of normalization-

First Normal Form-

Every Table in your Database should at least follow the 1st Normal Form, always. there are 4 basic rules that a table should follow to be in the 1st Normal Form.

  1. Each column should contain atomic values. Entries like X, Y, and W, X violate this rule.
  2. A Column should contain values that are of the same type. Do not intermix different types of values in any column.
  3. A Column should have a unique name. Same names lead to confusion at the time of data retrieval.
  4. The order in which data is saved does not matter. Using the SQL query, you can easily fetch data in any order from a table.

Second Normal Form-

For a table or relationship to be in the Second Normal Form first it should be in the 1st Normal Form and it should not have any Partial Dependencies.

Example: In a Student project details table there are four columns StudentId, ProjectId, StudentName, and ProjectName. the StudentName can be determined by StudentId and ProjectName can be determined by ProjectId both are show the partial dependencies.

To remove the partial dependency decompose the table into two parts the first table will contain StudentId, ProjectId, and StudentName and the second table will contain ProjectId and ProjectName.

Third Normal Form-

For a table or a relationship to be a Third Normal Form it must be in Second Normal Form and it should not have Transitive Dependency.

Example: In the MovieListing table there are four columns which are Movie_ID, Listing_ID, Listing_Type, DVD_Price, and Movie_ID->Listing_Type i.e. transitive functional dependency.

Boyce-Codd Normal Form-

For a table or a relationship to be a Boyce-Codd Normal Form it should be in the 3rd Normal Form and any dependency A->B, A should be a super key. which means, for, A->B if A is non-prime and B is a prime attribute. BCNF deals with the cases where “Non-Prime in a functional dependency derives a prime attribute”.

Consider the example -

Students can join multiple courses and one teacher teaches one course. each student can take a course from one teacher only. then Functional Dependencies are student# and course ID: Teacher ID but teacher ID also determines the course teacher ID: course ID so one needs to split the table into student-teacher and course teacher tables.

Fourth Normal Form-

For a table or a relationship to be a Forth Normal Form it should satisfy BCNF and it should not have Multi-Valued Dependency.

Any dependency A->B is Multi-Valued Dependency then B and C should be independent of each other.

1)A->>B, for a single value of A, more than one value of B exists.

2)The table should have at least 3 columns.

3)For this table with A, B, C columns, B and C should be independent.

If all these are true, then we can say that the table may have Multi-valued Dependency. A table can have both Functional Dependency and Multi-Valued Dependency.

Consider the table which contains the course, student, and paper data. One course can be taken by many students one course can have many papers then the multi-valued dependencies are as course: student and course: papers.

--

--