Fall Semester 2002


Database Design (I): Normalization
Learning Objectives

Key Terms

Chapter Outline

I. Introduction
II. Functional Dependence
III. Keys
IV. First Normal Form
V. Second Normal Form
VI. Third Normal Form
VII. Incorrect Decompositions
VIII. Multivalued Dependencies and Fourth Normal Form
IX. Avoiding the Problem with Multivalued Dependencies
X. Summary
XI. Key Terms
XII. Review Questions
XIII. Premiere Products Exercises
XIV. Henry Books Case

Notes and Main Ideas

Introduction
The normalization process enables you to identify the existence of potential problems, called update anomalies, in the design of a relational database. To correct update anomalies in a database, you must convert tables into various types of normal forms. The most common normal forms are first normal form (1NF), second normal form (2NF), third normal form (3NF), and fourth normal form (4NF). Normalization is a progression in which a table that is in first normal form is better than a table that is not in first normal form, a table that is in second normal form is better than a table in first normal form, and so on.

Functional Dependence
Functional dependence is a formal name for what is basically a simple idea. A column (attribute), B, is functionally dependent on another column, A (or possibly a collection of columns), if a value for A determines a single value for B at any one time.

Keys
Column (attribute) A (or a collection of columns) is the primary key for a table (relation), R, if the following two properties hold (are true):
Property 1: All columns in R are functionally dependent on A.

Property 2: No subcollection of the columns in A (assuming that A is a collection of columns and not just a single column) also has Property 1.

A candidate key is a column or collection of columns that has the same properties presented in the definition of the primary key. Candidate keys that are not chosen to be primary keys are called alternate keys.

First Normal Form
An unnormalized relation is a relation (table) that contains a repeating group. A table (relation) is in first normal form (1NF) if it does not contain repeating groups.

Second Normal Form
Update anomalies fall into four categories: update, inconsistent data, additions, and deletions. These anomalies occur when a column is dependent on only a portion of the primary key. Second normal form eliminates update anomalies caused by partial dependencies. A table (relation) is in second normal form (2NF) if it is in first normal form and no nonkey attribute is dependent on only a portion of the primary key. An attribute is a nonkey attribute if it is not a part of the primary key.

Third Normal Form
Any attribute or collection of attributes that determines another attribute is called a determinant. A candidate key is an attribute or collection of attributes that could function as the primary key. Update anomalies also can occur when one nonkey attribute determines another nonkey attribute. A table (relation) is in third normal form (3NF) if it is in second normal form and the only determinants it contains are candidate keys.

Incorrect Decompositions
The decomposition of a table into two or more 3NF tables must be done by the method described in the previous sections, even though there are other possibilities that might seem to be legitimate. This section gives examples of some other possibilities and the problems they pose.

Multivalued Dependencies and Fourth Normal Form
In a table with columns A, B, and C, there is a multivalued dependence of column B on column A (also read as "B is multidependent on A" or "A multidetermines B") if each value for A is associated with a specific collection of values for B and this collection is independent of any values for C. A table is in fourth normal form (4NF) if it is in 3NF and there are no multivalued dependencies.

Avoiding the Problem with Multivalued Dependencies
Any table that is not in 4NF suffers some serious problems, but there is a way to avoid dealing with the issue. It is better to have a design methodology for normalizing tables that will prevent this situation from occurring in the first place. This section explains how to avoid these problems.

Quick Quiz
1. What term is used to describe a candidate key that is not chosen as the primary key? ANSWER: alternate key (highlight to see the answer)

2. What term is used to describe a column that is not part of the primary key? ANSWER: nonkey column or nonkey attribute (highlight to see the answer)

3. In what normal form (NF) is a table automatically if the primary key contains only a single column? ANSWER: second (2NF) (highlight to see the answer)

Additional thoughts and notes


Last updated: Sep 22, 2002 by Adrian German for A114/I111