Fall Semester 2002


Database Design (II): Design Methodology
Learning Objectives

Key Terms

Chapter Outline

I. Introduction
II. User Views
III. Information-Level Design Methodology
IV. Represent a User View as a Collection of Tables
V. Normalize the Tables
VI. Represent All Keys
A. Database Design Language (DBDL)
B. Entity-Relationship Diagrams
VII. Merge the Result into the Design
VIII. Database Design Examples
IX. Physical-Level Design
X. Entity-Relationship Diagram Alternatives
XI. Top-Down Versus Bottom-up
XII. Survey Form
XIII. Obtaining Information from Existing Documents
XIV. One-to-One Relationship Considerations
XV. Many-to-Many Relationship Considerations
XVI. Nulls and Entity Subtypes
XVII. Avoiding Problems with Third Normal Form When Merging Tables
XVIII. Summary
XIX. Key Terms
XX. Review Questions
XXI. Premiere Products Exercises
XXII. Henry Books Case

Notes and Main Ideas

User Views
Database design is a two-step process. The first step is called information-level design, and it is completed independently of any particular DBMS. In the second step, physical-level design designers transform the information-level design into a design for a specific DBMS. A user view is a set of requirements that is necessary to support the operations of a particular database user. For each user view, designers must design the database structure to support the view and then merge it into a cumulative design, that is, a design that supports all the user views encountered thus far in the design process.

Information-Level Design Methodology
The information-level design methodology presented in this text involves representing individual user views, refining them to eliminate any problems, and then merging them into a cumulative design. When creating user views, a user can be a person or group that will use the system, a report, or a type of transaction. For each user view, the methodology requires you to complete the following steps:

  1. Represent the user view as a collection of tables
  2. Normalize these tables
  3. Identify all keys in these tables
  4. Merge the results of Steps 1 through 3 into the cumulative design.

Represent a User View as a Collection of Tables
When given a user view or some sort of stated requirement, you must develop a collection of tables that will support it. Determine entities involved and create a separate table for each type of entity. Determine the primary keys for each of these tables. Determine the properties for each of these entities. Determine the relationships among the entities.

Normalize the Tables
Once relationships are established between entities, the next task is to normalize each table, with the target being third normal form.

Represent All Keys
For each table, you must identify the primary key, any alternate keys, secondary keys, and foreign keys. Database Design Language (DBDL) is a mechanism for representing tables and keys. In DBDL, you represent a table by listing all columns and then underlining the primary key. Below the table definition, you list any alternate keys, secondary keys, and foreign keys. An entity-relationship (E-R) diagram visually represents the structure of a database. There are several different styles of E-R diagrams. This text uses a style called IDEF1X.

Merge the Result into the Design
As soon as you have completed steps 1 through 3 (represent view as a collection of tables, normalize, identify all key) for a given user view, you can merge the results into a cumulative design. You combine tables that have the same primary key to form a new table, eliminate duplicate columns, check the design to determine if the new tables are in third normal form. Repeat the process for each user view.

Database Design Examples
This section walks you through the information-level design of the two databases used in this text, Premiere Products and Henry Books.

Physical-Level Design
The physical-level design begins after the information-level design is completed. You must implement the design for a specific DBMS. Most DBMS support primary, alternate, secondary, and foreign keys. If a DBMS does not support these keys, you must devise an alternate scheme to ensure uniqueness of primary and alternate keys. You also must ensure that values in foreign keys are legitimate, that is, they match the value of the primary key in some row in another table. For secondary keys, it should be possible to retrieve data rapidly on the basis of a value of the secondary key.

Entity-Relationship Diagram Alternatives
This section covers alternative styles for E-R diagrams.

Top-Down Versus Bottom-Up
You can design a database using a bottom-up design methodology or a top-down design methodology. In a bottom-up design methodology, specific user requirements are synthesized into a design. A top-Down design methodology begins with a general database design that models the overall enterprise and that repeatedly refines the model to achieve a design that supports all necessary applications. One advantage of a top-down approach is that there is a more global feel to the project. A bottom-up approach provides a rigorous way of tackling each separate requirement and ensuring that it will be met. The ideal strategy combines the best of both approaches.

Survey Form
When designing a database, it may be helpful to design a survey form to obtain the required information from users. The survey form should contain the following information: entity information, attribute (column) information, relationships, functional dependencies, and processing information.

Obtaining Information from Existing Documents
Existing documents, such as invoices, can provide helpful information concerning the database design. If an organization has a computerized system, current file layouts can furnish additional information about entities and attributes.

One-to-One Relationship Considerations
This section discusses potential problems and possible solutions for implementing one-to-one relationships between tables.

Many-to-Many Relationship Considerations
This section discusses potential problems and possible solutions for implementing many-to-many relationships between tables. It also describes many-to-many-to-many relationships that involve three entities.

Nulls and Entity Subtypes
A null actually represents the absence of a value in a field. Nulls are used when a value is either unknown or inapplicable. Suggested methods for handling null values are discussed. When you remove a column that contain null values from a table and create a separate table containing only non-null values, you create an entity subtype. There are specific methods for representing entity subtypes in E-R diagrams.

Avoiding Problems with Third Normal Form When Merging Tables
When you combine third normal form tables, the result may not be in third normal form. You can attempt to avoid the problem of creating a table that is not in third normal form by being cautious when representing user views. The problem occurs when a column A in one user view functionally determines a column B in a second user view. Thus, column A is a determinant for column B, yet column A is not a column in the second user view.

Quick Quiz

  1. What is the set of requirements that is necessary to support the operations of a particular database user?
    ANSWER: (user view)

  2. What type of key is a column or collection of columns that could have been chosen as a primary key but was not?
    ANSWER: (alternate)

  3. What is a popular type of diagram that visually represents the structure of a database?
    ANSWER: (entity-relationship (E-R) diagram)

Important Aspects (Self-Study Tips)

User view 1 (Enter/edit territories):
Territory (TerritoryNum, TerritoryName)
User view 2 (Enter/edit Reps):
Rep (RepNum, LastName, FirstName, Address, City, State, Zip, MTDSales, 
     YTDSales, MTDCommission, YTDCommission, CommissionRate, TerritoryNum)

FK TerritoryNum -> Territory
User view 3 (Enter/edit customers):
Customer (CustomerNum, CustomerName, Address1, Address2, City,
          State, Zip, MTDSales, YTDSales, CurrentBalance, CreditLimit, 
          ShipName, ShipAddress1, ShipAddress2, ShipCity, ShipState, 
          ShipZip, RepNum)

FK RepNum -> Rep

Comment:

If the ship-to address fields are null for a particular customer, this customer has no special ship-to address. In this case, goods are to be shipped to the customer's regular address.
User view 4 (Enter/edit parts):
Part (PartNum, Description, UnitPrice, MTDSales, YTDSales, 
      OnHand, Allocated, ReorderPoint)
User view 5 (Enter/edit vendors):
Vendor (VendorNum, VendorName, Address, City, State, Zip)

VendorPart (VendorNum, PartNum, Price, MinimumOrderQuantity, ExpectedLeadTime)

FK VendorNum -> Vendor	
FK PartNum -> Part	
User view 6 (Order entry):
Orders (OrderNum, OrderDate, CustomerNum, CustomerPONum)
		
FK CustomerNum -> Customer

OrderDetail (OrderNum, SEQNum, PartNum, NumOrdered, QuotedPrice)

FK OrderNum -> Orders	
FK PartNum -> Part

Comment:

You might ask how the design would differ if users are to be able to override the ship-to address for a customer on a particular order; i.e., the ship-to address for the order is allowed to be different from the ship-to address currently on file for the customer. In this case, the Orders table would also include ship-to fields. These must accept nulls, since they will not apply to each order. If on a given order the values are not null, they are understood to be the ship-to address. If they are null, the ship-to values from the Customer table are used instead.

Cumulative collection thus far:
Territory (TerritoryNum, TerritoryName)

Rep (RepNum, LastName, FirstName, Address, City, State, Zip, MTDSales, 
     YTDSales, MTDCommission, YTDCommission, CommissionRate, TerritoryNum)
     
FK TerritoryNum -> Territory

Customer (CustomerNum, CustomerName, $ddress1, Address2, City, State, 
          Zip, MTDSales, YTDSales, CurrentBalance, CreditLimit, ShipName, 
          ShipAddress1, ShipAddress2, ShipCity, ShipState, ShipZip, RepNum)

FK RepNum -> Rep

Part (PartNum, Description, UnitPrice, MTDSales, YTDSales, 
      OnHand, Allocated, ReorderPoint)

Vendor (VendorNum, VendorName, Address, City, State, Zip)

VendorPart (VendorNum, PartNum, Price, MinimumOrderQuantity, ExpectedLeadTime)

FK VendorNum -> Vendor	
FK PartNum -> Part	

Orders (OrderNum, OrderDate, CustomerNum, CustomerPONum)

FK CustomerNum -> Customer

OrderDetail (OrderNum, SEQNum, PartNum, NumOrdered, QuotedPrice)

FK OrderNum -> Orders	
FK PartNum -> Part
User view 7 (Invoicing cycle):
Customer (CustomerNum, CurrentInvoiceTotal, ...)

Orders (OrderNum, ...)

OrderDetail (OrderNum, SEQNum, NumShipped, ...)
Note:
Other fields in the Customer, Orders, and OrderDetail tables are already in the cumulative collection and will not add anything new.
Invoice(InvoiceNum, ShipDate, Freight, OrderNum, ShipStatus)

FK OrderNum -> Orders	

InvoiceSummary (InvoiceNum, InvoiceDate, InvoiceTotal, Freight, CustomerNum)

FK CustomerNum -> Customer
Comment:
Invoice records are created during the first phase of the invoicing cycle, when users are entering the numbers of the orders to release for invoicing. These records are used as invoices are printer, files are updated, and invoice summary records are created. Finally, as released orders are deleted from the database, these records are deleted also. Because of the limited life of these records and the relatively simple manner in which they are used, they are often kept external to the database in a simple flat file. In this case, deleting these records is accomplished by deleting the entire file. This is just an alternative approach we may keep in mind.

Another thing worth paying attention to concerns the relationship, or, rather, the lack of a relationship between Invoice and InvoiceSummary. Both tables have the same primary key and yet they should not be combined. To appreciate the reason there is no relationship, you have to consider the sequence of events: an Invoicerecord is created that must match some order, the invoice is printed, files are updated, an InvoiceSummary record is created, and then the order and corresponding Invoice record are deleted. Thus, an Invoice record cannot be required to match an InvoiceSummary record since the InvoiceSummary record is created after the corresponding Invoice record. An InvoiceSummary record cannot be required to match an Invoice record, since the Invoice record will be deleted long before the InvoiceSummary record will. (InvoiceSummary records are not deleted until the end of the month.) For similar reasons the Invoice and InvoiceSummary tables cannot be combined. Note that if the Invoice information is kept outside of the database, we are not even faced with such an issue.

This is a good place to discuss about system-generated numbers, that is, users may wish to assign invoice numbers themselves or, what is actually more likely, they might prefer that the system automatically assigns these numbers. Assuming that the numbers are assigned sequentially, the system merely needs to store the last invoice number assigned. Then, when a new invoice is created, the system will retrieve this number, increment it by one giving the new invoice number, and then update the field containing the last number assigned with this new number. A natural question to raise at this point is: Where should this number be stored? Should it be a field in some special record in the database or in some separate file? It could actually go either place. If the number is only used by programs that access the database, placing it in the database is preferable. If other programs also use it, it is usually preferable to place it in a separate file.

User view 8 (Receive payments):
Customer (CustomerNum, CurrentPaymentTotal, ...)
Note:
Other fields in the Customer table are already in the cumulative collection and will not add anything new.
Payment (PaymentNum, CustomerNum, Payment Date, Amount)

FK CustomerNum -> Customer
Comment:
The same comment made concerning the system generating invoice numbers also applies to payment numbers.

User view 9 (Statements):
Customer (CustomerNum, CustomerName, Address1, Address2, City, 
          State, Zip, CurrentBalance, CreditLimit, RepNum, CurrentInvoiceTotal, 
          CurrentPaymentTotal, CurrentAmount, Over30, Over60, Over90, 
          PreviousBalance)
Final cumulative collection:
Territory (TerritoryNum, TerritoryName)

Rep (RepNum, LastName, FirstName, Address, City, State, Zip, 
     MTDSales, YTDSales, MTDCommission, YTDCommission, 
     CommissionRate, TerritoryNum)

FK TerritoryNum -> Territory

Customer (CustomerNum, CustomerName, Address1, Address2, City,
          State, Zip, MTDSales, YTDSales, CurrentBalance, 
          CreditLimit, ShipName, ShipAddress1, ShipAddress2, 
          ShipCity, ShipState, ShipZip, CurrentInvoiceTotal,
          CurrentPaymentTotal, CurrentAmount, Over30, Over60, 
          Over90, PreviousBalance, RepNum)

FK RepNum -> Rep	

Part (PartNum, Description, UnitPrice, MTDSales, YTDSales, 
      OnHand, Allocated, ReorderPoint)

Vendor (VendorNum, VendorName, Address, City, State, Zip)

VendorPart (VendorNum, PartNum, Price, MinimumOrderQuantity,
            ExpectedLeadTime)

FK VendorNum -> Vendor	
FK PartNum -> Part	

Orders (OrderNum, OrderDate, CustomerNum, CustomerPONum)

FK CustomerNum -> Customer

OrderDetail (OrderNum, SEQNum, PartNum, NumOrdered, QuotedPrice, NumShipped)

FK OrderNum -> Orders	
FK PartNum -> Part

Invoice(InvoiceNum, ShipDate, Freight, OrderNum, ShipStatus)

FK OrderNum -> Orders	

InvoiceSummary (InvoiceNum, InvoiceDate, InvoiceTotal, Freight, CustomerNum)

FK CustomerNum -> Customer	

Payment (PaymentNum, CustomerNum, PaymentDate, Amount)

FK CustomerNum -> Customer
Once again, you should feel free to use this set of notes in any way you like to prepare But please turn in an assignment that (even if it relies heavily on this material, should that be the case) is yours beyond any reasonable doubt. We're interested in your exposure to core problems of relational database management systems, and their solutions.


Last updated: Oct 13, 2002 by Adrian German for A114/I111