Thursday, November 23, 2023

Database - Normalization

Benefits of Normalization

  • Reduced Data Redundancy: Normalization eliminates duplicate data, saving storage space and improving database efficiency.
  • Improved Data Integrity: Normalization reduces the likelihood of data anomalies, ensuring data accuracy and consistency.
  • Simplified Data Manipulation: Normalization makes data updates and modifications easier and more efficient.
  • Enhanced Database Scalability: Normalization enables the database to accommodate growing data volumes without performance degradation.

Normalization vs. Denormalization

While normalization is generally beneficial, there may be situations where denormalization, the process of intentionally introducing redundancy, is considered. 

This is typically done to improve performance for specific queries or applications. 

However, denormalization should be done with caution, as it can lead to data anomalies and increased maintenance overhead

First Normal Form (1NF)

First Normal Form (1NF): Eliminates repeating groups and ensures that each cell in a table contains a single value.

Example 1: Student Registration System

Original Table:

1NF Table Structure:

Student Table:

Course Table

Second Normal Form (2NF)

2NF Original Table Example 1


2NF Table Structure:

Second Normal Form (2NF): Requires that all non-key attributes be fully dependent on the primary key, eliminating partial dependencies.

Course Table:

Prerequisites Table

2NF Original Table Example 2

Online Book Store Original Table:

2NF Table Structure:

Third Normal Form (3NF)

Third Normal Form (3NF): Eliminates transitive dependencies, ensuring that non-key attributes are directly dependent only on the primary key.

Example: University Course Catalog

Original Table

3NF Table Structure:

Department Table

Prerequisite Table

Instructor table


  • The instructor attribute is not fully dependent on the primary key (Course ID), as there could be multiple instructors assigned to a course. Therefore, the instructor attribute should be moved to a separate instructor table with its own primary key.
  • 3NF is a more stringent form of normalization than 2NF, and it can help to further reduce data redundancy and improve data integrity. However, 3NF can also make database queries more complex, and it may not be necessary for all databases.
  • In general, you should follow 3NF normalization for databases that are subject to frequent updates or that need to maintain a high level of data integrity. However, for databases that are not frequently updated or that do not require a high level of data integrity, 2NF normalization may be sufficient.

When to use each normal form:

1NF:

  • Use 1NF for the most basic level of data organization.
  • It eliminates repeating groups and ensures that each cell contains only one value.
  • It is the minimum requirement for a relational database.

2NF:

  • Use 2NF when you need to further reduce data redundancy and eliminate partial dependencies.
  • It ensures that all non-key attributes are fully dependent on the entire primary key, not just a subset of it.
  • It provides a better balance between data integrity and efficiency compared to 1NF.

3NF:

  • Use 3NF when you need the highest level of data integrity and elimination of transitive dependencies.
  • It ensures that no non-key attribute is transitively dependent on another non-key attribute.
  • It provides the highest level of data integrity but may result in more complex queries.

Here's a general guideline for when to use each normal form:

  • Use 1NF for simple databases with minimal data redundancy.
  • Use 2NF for databases with more complex data relationships and a need to reduce redundancy.
  • Use 3NF for databases with high data integrity (high data accuracy and consistency ) requirements and where transitive dependencies (indirect dependency relationships) can lead to anomalies

No comments:

Post a Comment

LeetCode C++ Cheat Sheet June

🎯 Core Patterns & Representative Questions 1. Arrays & Hashing Two Sum – hash map → O(n) Contains Duplicate , Product of A...