Normalization

Last Updated: August 24, 2020

Definition - What does Normalization mean?

Normalization is the process of reorganizing data in a database so that it meets two basic requirements:

  1. There is no redundancy of data, all data is stored in only one place.
  2. Data dependencies are logical,all related data items are stored together.

Normalization is important for many reasons, but chiefly because it allows databases to take up as little disk space as possible, resulting in increased performance.

Normalization is also known as data normalization.

Techopedia explains Normalization

The first goal during data normalization is to detect and remove all duplicate data by logically grouping data redundancies together. Whenever a piece of data is dependent on another, the two should be stored in proximity within that data set.

By getting rid of all anomalies and organizing unstructured data into a structured form, normalization greatly improves the usability of a data set. Data can be visualized more easily, insights could be extracted more efficiently, and information can be updated more quickly. As redundancies are merged together, the risk of errors and duplicates further making data even more disorganized is reduced. On top of all that, a normalized database takes less space, getting rid of many disk space problems, and increasing its overall performance significantly.

The three main types of normalization are listed below. Note: "NF" refers to "normal form."

First normal form (1NF)

Tables in 1NF must adhere to some rules:

  • Each cell must contain only a single (atomic) value.

  • Every column in the table must be uniquely named.

  • All values in a column must pertain to the same domain.

Second normal form (2NF)

Tables in 2NF must be in 1NF and not have any partial dependency (e.g. every non-prime attribute must be dependent on the table’s primary key).

Third normal form (3NF)

Tables in 3NF must be in 2NF and have no transitive functional dependencies on the primary key.

The following two NFs also exist but are rarely used:

Boyce-Codd Normal Form (BCNF)

A higher version of the 3NF, the Boyce-Codd Normal Form is used to address the anomalies which might result if one more than one candidate key exists. Also known as 3.5 Normal Form, the BCNF must be in 3NF and in all functional dependencies ( X → Y ), X should be a super key.

Fourth Normal Form (4NF)

For a table to in 4NF, it must be in BCNF and not have a multi-valued dependency.

The first three NFs were derived in the early 1970s by the father of the relational data model, E.F. Codd. Almost all of today's relational database engines use his rules.

Some relational database engines do not strictly meet the criteria for all rules of normalization. An example is the multivalued fields feature introduced by Microsoft in the Access 2007 database application. There has been heated debate in database circles as to whether such features now disqualify such applications from being true relational database management systems.

Share this: