1NF, 2NF, 3NF Explained | Database Normalization Steps
# CHAPTER 10
First, Second, and Third Normal Forms
1. Introduction
In Chapter 9, we learned that Data Redundancy corrupts databases. To eradicate this corruption, Database Architects run their schema designs through a strict, 3-step mathematical checklist. These steps are called the Normal Forms. If a database passes all three tests, it is considered structurally sound and "enterprise-ready." In this chapter, we will walk through the exact rules of 1NF, 2NF, and 3NF to perfectly normalize a chaotic dataset.2. Learning Objectives
By the end of this chapter, you will be able to:- Understand the progressive nature of the Normal Forms.
- Apply First Normal Form (1NF) by ensuring atomic columns.
- Apply Second Normal Form (2NF) by eliminating partial dependencies.
- Apply Third Normal Form (3NF) by eliminating transitive dependencies.
- Transform a messy spreadsheet into a flawless relational schema.
3. The Starting Point: The Unnormalized Table (UNF)
Imagine a small business tracking their data in a single, chaotic Excel spreadsheet:Table: StudentCourses
| studentid | name | phonenumbers | courseid | coursename | instructor | instructoremail |
*Data Row:* | 1 | John | 555-1234, 555-9999 | 101 | Math | Dr. Smith | smith@uni.edu |
This is a disaster. Let's fix it step by step.
4. First Normal Form (1NF)
The Rule: A table is in 1NF if every column is "Atomic" (indivisible). You cannot have arrays, lists, or multiple values in a single cell.*The Violation:* John's row has phonenumbers = 555-1234, 555-9999. A comma-separated list violates 1NF. The database cannot effectively search or join on that column.
*The Fix:* We must break it apart. In a relational database, multi-valued attributes must be split into multiple rows or moved to a new table.
Table: Students
| studentid | name | courseid | coursename | instructor | instructoremail |
Table: StudentPhones
| studentid | phonenumber | (Row 1: 1, 555-1234. Row 2: 1, 555-9999).
*Status: We are now in 1NF!*
5. Second Normal Form (2NF)
The Rule: A table is in 2NF if it is already in 1NF, AND it has no "Partial Dependencies." This means every non-key column must depend on the *entire* Primary Key, not just a piece of it. (This rule only applies to tables with *Composite Primary Keys*).*The Violation:* Look at our 1NF Students table. The true unique identifier (Primary Key) for a row is the combination of (studentid, courseid).
Does coursename depend on the studentid? No! The name of the math class is "Math" regardless of whether John is taking it. It depends *only* on the courseid. This is a Partial Dependency. It causes massive data duplication.
*The Fix:* Break the offending columns into their own table!
Table: Students
| studentid (PK) | name |
Table: Courses
| courseid (PK) | coursename | instructor | instructoremail |
Table: Enrollments (The Bridge)
| studentid (FK) | courseid (FK) |
*Status: We are now in 2NF!*
6. Third Normal Form (3NF)
The Rule: A table is in 3NF if it is already in 2NF, AND it has no "Transitive Dependencies." A transitive dependency is when *Column A depends on Column B, and Column B depends on the Primary Key.* As Edgar F. Codd famously summarized: *"Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key, so help me Codd."**The Violation:* Look at our 2NF Courses table:
| courseid | coursename | instructor | instructoremail |
Does instructoremail depend on the courseid? No! The email depends on the instructor. The instructor depends on the course. This is a Transitive Dependency! If Dr. Smith changes his email, we have an Update Anomaly across every course he teaches.
*The Fix:* Break them apart!
Table: Courses
| courseid | coursename | instructorid (FK) |
Table: Instructors
| instructorid (PK) | name | email |
*Status: We are now in 3NF! The database is structurally perfect.*
7. Mini Project: Normalizing an Invoice
UNF:| invoiceid | date | customername | customerzip | itemid | itemprice |
-
1.
1NF: The invoice contains multiple items. Split it! Create an
InvoiceItemstable to hold the line items, removing repeating groups.
-
2.
2NF:
itempricedepends only onitemid, not theinvoiceid. Move Items to aProductstable!
-
3.
3NF:
customerzipdepends oncustomername, not theinvoiceid. Move Customers to aCustomerstable!
8. Common Mistakes
- Skipping Steps: You cannot jump straight to 3NF. Normalization is strictly sequential. You must pass 1NF to qualify for 2NF, and pass 2NF to qualify for 3NF.
9. Best Practices
-
Stop at 3NF for Web Apps: While higher normal forms exist (BCNF, 4NF, 5NF), 3rd Normal Form is universally considered the "sweet spot" for 95% of standard web applications, balancing perfect data integrity with reasonable
JOINperformance.
10. Exercises
-
1.
What normalization rule strictly prohibits storing a comma-separated list of items (like
"Red, Blue, Green") inside a single column?
- 2. If Column A relies on Column B, and Column B relies on the Primary Key, what type of dependency is this, and which Normal Form does it violate?
11. Database Design Challenges
You have a table:Employees(empid, empname, departmentid, departmentlocation).
The departmentlocation depends strictly on the departmentid, not the empid.
What Normal Form is currently being violated, and how do you fix it?
*(Answer: It violates 3NF due to a Transitive Dependency. Fix: Move departmentid and departmentlocation into a new Departments table).*
12. MCQ Quiz with Answers
To satisfy the requirements of First Normal Form (1NF), what structural rule must be mathematically enforced on every column within a table?
According to the rules of Third Normal Form (3NF), "Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key." What specific type of database dependency does 3NF eliminate?
13. Interview Questions
- Q: Recite the core rules for 1NF, 2NF, and 3NF. Explain how sequentially applying these rules eliminates the Update, Insertion, and Deletion anomalies discussed in the previous chapter.
-
Q: A developer has a
Userstable and adds a column calledfavoritecolorswhere they plan to store JSON arrays like["Red", "Blue"]. Explain why this violates 1NF, and how you would architect the correct Relational solution.