## Normalization

### 11.1 Top View

Eliminate anomalies by minimizing duplication: divide data between tables

• Each relational scheme deals with a single theme
• Likely to introduce interrelation (i.e., referential integrity) constraints
• Make it more efficient to maintain databases
• May make data more difficult to access
Students
 St-Name Status Ben senior Dan freshment
Courses
 Course-Name Course-# db 670 ds 680
 St-Name Course-# Salary Ben 670 500 Ben 680 500 Dan 670 400 Dan 680 600

### 11.2 The Approach

• Normal forms provide
• Yard sticks for identifying bad designs
• Guidelines for eliminating the problems

• Analysis: Top-down
• Evaluate the relation schemas against the normal forms
• Decompose when the criteria are not met

• Synthesis: Composes relations

### 11.3 Types of Normal Forms

 Functional dependencies over primary keys: 1st normal form 2nd normal form 3rd normal form Boyce-Codd normal form Multivalues dependencies: 4NF Join dependencies: 5NF

### 11.4 First Normal Form

Allow only atomic values in attributes

### Normalization through Decomposition

• Attributes violating 1NF and place it in a separate relation, along with the primary key
 A B C x 1 a,b z 2 c
 A B x 1 z 2
 B C 1 a 1 b 2 c

• The new relation uses all the attributes for a primary key

### Normalization through Unrolling Multivalues

 A B C x 1 a,b z 2 c
 A B C x 1 a x 1 b z 2 c

• Introduces duplicated data

### Normalization through Unrolling Attributes

• Assumes a limit on the cardinality of multivalues
• Needs NULL values
 A B C x 1 a,b z 2 c
 A B C C’ x 1 a b z 2 c NULL

### 11.5 Second Normal Form

Attributes not in the primary key should be fully dependent on the primary key.

 SSN NAME PROJ-# PROJ-NAME PROJ-LOC HOURS
partial functional
dependencies
SSN NAME
PROJ-# PROJ-NAME, PROJ-LOC

full functional
dependencies
SSN, PROJ-# HOURS

### Normalization

Set a new table for each partial key and its dependent attributes.

 SSN PROJ-# HOURS
 SSN NAME
 PROJ-# PROJ-NAME PROJ-LOC

### 11.6 Third Normal Form

Attributes not in the primary key should be functionally dependent only on the primary key.

 SSN NAME DEPT-# DEPT-NAME

DEPT-# DEPT-NAME

Every non-key attribute must be a fact about the key, the whole key, and nothing but the key.

### Normalization

• Create a separate relationship schema for the non-key attributes that functionally depend on non-key attributes
• Designate the latter attributes as primary keys.
 SSN NAME DEPT-#
 DEPT-# DEPT-NAME

• The new schemas represent independent entity facts about employees and departments.
• Natural join recovers the original relation.
• Without the decomposition, the schema is subject to update, deletion, and insertion anomalies.

### 11.7 Example

Exercise 10.27, p. 329 Consider the universal relation R = {A,B,C,D,E,F,G,H,I,J} and the set of functional dependencies

1. AB C
2. BD EF
4. A I
5. H J

For the above data

1. Determine the key of R
2. Decompose R into 2NF relations
3. Decompose the outcome into 3NF relations

Sketch of Solution

• Draw a schema with arrows showing the functional dependencies (e.g., fig 14.11(a))
• Find the primary key
• Find the closures of the proper nonempty subsets of the primary key
• Use the above closures to perform 2NF decompositions
• Find the closures of the proper nonempty subsets of the attributes not included in the primary key
• Use the above closures to perform 3NF decompositions

### 11.8 Boyce-Codd Normal Form (BCNF)

If X Y is non-trivial then X is a super key

 PROJECT DEPT MANAGER
• DEPT,PROJECTMANAGER
MANAGERDEPT
• Insertion anomaly: the DEPT of a MANAGER can’t be stored, if the PROJECT is not given

### Normalization

 PROJECT MANAGER
 MANAGER DEPT

### 11.9 Generalization

Assume keys instead of primary keys.

 Non-2NF Partial dependency Non-3NF transitive dependency: non-BCNF Y is not in X, Y ⊆ key, and X is not a superkey.

### 11.10 Assignment #6

Due: Mo, Feb 23

Problems 10.18 (a, c, e, g, i), 10.20, 10.26 and 10.29 from textbook (pages 328–330 4th ed; 373–374 5th ed).

Notes

• If you submit your homework electronically, use the departmental submit utility: submit XXX lab6 filename. XXX represents ‘c670aa’ for students of the 3:30 section, and ‘c670ab’ for students of the 5:30 section.

• Files are restricted to 1,048,576 bytes
• The submit program issues error messages to the student’s CSE email account

Reference: Ch. 10.3–10.6 in textbook.