|
Spring Semester 2007
|
Lecture Notes Seven: Principles of Database Design
|
This week we will explore HTTP.
|
There's not much to explore, things are pretty simple.
|
|
Next week we start working with MySQL.
|
In that respect we need to clarify how information is to be structured.
|
|
MySQL is an RDBMS.
|
RDBMS?
|
|
It's a Relational Database Management System.
|
So our databases will be relational?
|
|
Yes, they will be composed of tables (tables are relations).
|
We need to know how one usually structures information for storage in such a database.
|
|
There are some mathematical principles behind it.
|
Let's review them through an example.
|
|
OK. Here's our situation.
|
This is a hypothetical situation.
|
Premiere Products is a distributor of appliances, housewares, and sporting goods. Since its inception, the company
has used spreadsheet software to maintain customer, order, inventory and sales representative (sales rep) data. Management has
determined that the company's recent growth means it is no longer feasible to use spreadsheets to maintain its data. What has led
the managers at Premiere Products to this decision? One of the company's spreadsheets, shown below, displays sample order data and
illustrates the company's problems with the spreadsheet approach: Redundancy is one problem with the orders spreadsheet.
Redundancy wastes space. It also makes your changes more cumbersome, and that may lead to inconsistencies.
|
So the management determines spreadsheet software must be replaced. |
The new system must have the following specifications.
|
The following information about sales reps, customers and parts inventory:
- Sales rep number, last name, first name, address, total commission, and commission rate for each rep.
- Customer number, name, address, current balance and credit limit per customer, along with tyhe number of the sales rep representing the customer.
- The company must store the part number, description, number of units on hand, item class, number of the warehouse where the item is stored, and unit
price for each part in inventory.
Premiere Products ,ust also store information about orders:
The sample order has three components:
- The heading (top) of the order contains the order number and date; the customer's number, name, and address; and the sales rep number and name.
- The body of the order contains one or more order lines, sometimes called line items. Each order line contains a part number, a part description,
the number of units of the part ordered, and the quoted price for the part. Each order line also contains a total, usually called an extension, which
is the result of multiplying the number ordered by the quoted price.
- Finally, the footing (bottom) of the order contains the order total.
|
If this is what the order looks like, what do we store?
|
Basically everything on it, but here's a better rundown on those items.
|
Premiere Products must also store the following items for each customer's order:
- For each order, it must store the order number, the date the order was placed, and the number of the customer that placed the order. The customer's name
and address and the number of the sales rep who represents the customer are stored with customer information. The name of the sales rep is stored with the sales
rep information.
- For each order, it must store the order number, the part number, the number of units ordered, and the quoted price. Remember that the part description is
stored with the information about parts. The result of multiplying the number of units ordered by the quoted price is not stored because the computer can
calculate it when necessary.
- The overall order total is not stored. Instead, the computer calculates the total whenever an order is printed or displayed on the screen.
|
Now, to keep a long story short, here's how the data might be organized.
|
This is just a very crude, intuitive approach.
|
|
Note that we want to shortcut through 30 years of theory.
|
So we work this out, give you only the highlights.
|
|
So we could start as follows.
|
Basically, organize our operation around what we sell.
|
Part ( PartNum,
Description,
OnHand,
Class,
Warehouse,
Price,
(OrderNum, OrderDate, CustomerNum, CustomerName, RepNum, RepName, NumOrdered, QuotedPrice)
)
|
A few things are missing.
|
Commission rate, commission thus far, etc.
|
|
They can be easily added a bit later.
|
What's the significance of the parentheses?
|
|
They group together the fields of a table.
|
So that means we have a table for each line in Part?
|
|
Yes. That's called an unnormalized relation.
|
How do we normalize?
|
|
First, eliminate repeating groups.
|
But then we'd be so very extremely redundant!
|
|
Sure, but that's just the start.
|
OK. That puts us in the First Normal Forms (no repeating groups).
|
|
We then need to determine functional dependencies.
|
Better give me an example instead of (or along with) a definition.
|
|
A relation (table) that contains a repeating group (or multiple entries for a single record) is called an unnormalized relation.
|
We knew that, already.
|
|
Removal of repeating groups is the starting point in the quest to create tables that are as free of problems as possible.
Tables without repeating groups are said to be in the first normal form.
|
Reaching first normal form is an ugly stage in our case.
|
|
I agree, because of the horrendous redundancy that results.
|
So the next step is to determine functional dependencies.
|
|
Tell me once again what those are.
|
Let me just list them here in this case.
|
PartNum determines:
- Description
- OnHand
- Class
- Warehouse
- Price
|
There are some conventions in here, some implicit.
|
Like the fact that a part is stored in a warehouse and only one.
|
|
Yes, but those are not entirely hard to see with the naked eye.
|
Correct. Let's list the other functional dependencies.
|
OrderNum determines:
- OrderDate and
- CustomerNum, which determines:
- CustomerName, and
- RepNum, which determines
RepName
|
|
So functional dependencies are almost like cause-effect links?
|
|
Something like that: a column (attribute) B is functionally dependent on another column A (or possibly a collection of columns)
if each value for A in the database is associated with exactly one value of B.
|
If you know the value of A that determines the value for B.
|
|
So that's what we have above: if you know the part number you know what the part looks like,
how much you still have on hand, in what warehouse you will find it, what the catalogue price is
and so on.
|
Notice that in the definition B can be functionally dependent on a collection of columns.
|
|
We have this situation here.
|
Interesting.
|
(PartNum, OrderNum) determines
|
Yes, that's a logic consequence of what an order is.
|
I see. What do we do once we determine the functional dependencies?
|
|
We need a few more definitions.
|
Let's list them along with the action we need to take.
|
A column is a nonkey column (also called a non-key attribute) if it is not a part of the primary key.
|
What's the primary key in our case?
|
It's composite:
(PartNum, OrderNum)
|
|
There's something weird about it, I agree.
|
Well, that's what we want to point out now.
|
A table is in second normal form (2NF) if it is in first normal form and no nonkey column is dependent on only a portion of the primary key.
|
So, eliminating these partial dependencies is what we need to do now.
|
Precisely. What are some problems partial dependencies can create?
|
|
They're called update anomalies and they fall into four categories:
|
Eliinate them and we have a 2NF database.
|
- update (change of description of part DR93 is a nightmare, an avalanche of changes)
- inconsistent data (what if miss some of the updates in the avalanche, we're suddenly off)
- additions (adding a new part is a big issue: the key is composite, until someone orders it, it can't have a row in the table)
- deletions (deleting the only row mentioning some part number; this is the reverse of the problem listed just earlier)
|
Here's what I think we need to do.
|
Create two extra tables.
|
Part: (PartNum, OnHand, Class, Warehouse)
Order: (OrderNum, OrderDate, CustomerNum, CustomerName, RepNum, RepName)
PartOrder: (PartNum, OrderNum, NumOrdered, QuotedPrice)
|
Exactly.
|
That removes partial dependencies.
|
|
But now we have another issue.
|
What?
|
|
We need to make sure that all determinants are candidate keys.
|
That would bring us in the third normal form.
|
|
Can we characterize this situation a bit better?
|
Certainly. We need one preliminary definition, though.
|
Any column (or collection of columns) that determines another column is called a determinant.
A table is in third normal form (3NF) if it is in second normal form and the only determinants are candidate keys.
|
I see what we need to do: create a separate Customer table.
|
Indeed, CustomerNum is a determinant, non candidate key.
|
|
What do we get?
|
We're still not done!
|
Part: (PartNum, OnHand, Class, Warehouse)
Order: (OrderNum, OrderDate, CustomerNum)
Customer: (CustomerNum, CustomerName, RepNum, RepName)
PartOrder: (PartNum, OrderNum, NumOrdered, QuotedPrice)
|
A-ha! The RepNum is now a non-candidate key determinant.
|
Make a Rep table!
|
|
Here we go.
|
That's more like it.
|
Part: (PartNum, OnHand, Class, Warehouse)
Order: (OrderNum, OrderDate, CustomerNum)
Customer: (CustomerNum, CustomerName, RepNum)
Rep: (RepNum, RepName)
PartOrder: (PartNum, OrderNum, NumOrdered, QuotedPrice)
|
We're now done.
|
Yes, here's what the whole thing eventually boils down to.
|
First what's left of Part:
|
|
Next, what was generated through successive splitting of the tables:
Notice that the last two tables came out of
|
Ah, so this contains some data too.
|
Yes. Now try to answer the following questions.
|
- List the number and name of all sales reps.
SELECT RepNum, LastName, FirstName
FROM Rep;
- List the complete Customer table.
SELECT *
FROM Customer
- List the number and name of every customer
represented by sales rep 20.
SELECT CustomerNum, CustomerName
FROM Customer
WHERE RepNum = '20';
- List the number and name of all customers that
are represented by sales rep 20 and have credit limits of
$7,500.
SELECT CustomerNum, CustomerName
FROM Customer
WHERE RepNum = '20'
AND CreditLimit = 7500;
- List the number and name of all customers that are
represented by sales rep 20 or have credit limits of $7,500.
SELECT CustomerNum, CustomerName
FROM Customer
WHERE RepNum = '20'
OR CreditLimit = 7500;
- For each order, list the order number, order date,
the number of the customer that placed the order, and the name
of the customer that placed the order.
SELECT OrderNum, OrderDate, Orders.CustomerNum, CustomerName
FROM Orders, Customer
WHERE Orders.CustomerNum = Customer.CustomerNum;
- List the number and name of all customers represented by
Valerie Kaiser.
SELECT CustomerNum, CustomerName
FROM Customer, Rep
WHERE Customer.RepNum = Rep.RepNum
AND LastName = 'Kaiser'
AND FirstName = 'Valerie';
- How many customers have a credit limit of $7,500?
SELECT COUNT(*)
FROM Customer
WHERE CreditLimit = 7500;
- Find the total of the balances for all customers
represented by sales rep 65.
SELECT SUM(Balance)
FROM Customer
WHERE RepNum = '65';
- Give the part number, description, and on-hand value
OnHand * Price
for each part in item class AP.
SELECT PartNum, Description, OnHand * Price
FROM Part
WHERE Class = 'AP';
- List all columns and all rows in the
Part table. Sort the results by part description.
SELECT *
FROM Part
ORDER BY Description;
- List all columns and all rows in the Part
table. Sort the results by part number within item
class.
SELECT *
FROM Part
ORDER BY Class, PartNum;
- List the item class and
the sum of the value of parts on
hand. Group the results by item class.
SELECT Class, SUM(OnHand)
FROM Part
GROUP BY Class;
- Create a new table named Appliance to contain the
columns PartNum, Description, OnHand, Warehouse, and Price
for all rows in which the item class is AP.
SELECT PartNum, Description, OnHand, Warehouse, Price
INTO Appliance
FROM Part
WHERE Class='AP';
- In the Appliance table, change the description
of part KL62 to "Electric Dryer".
UPDATE Appliance
SET Description='Electric Dryer'
WHERE PartNum='KL62';
- In the Applicance table, delete every row in which the
warehouse number is 3.
DELETE
FROM Appliance
WHERE Warehouse='3'
The basic idea is this:
- try to find the answer by hand
- then check your answer using SQL
|
We can't use SQL yet.
|
|
That's true: we first need to install MySQL.
|
Once we do that, we need to define a database and populate it with data.
|
|
Well, all of that: next week.
|
I can hardly wait.
|
Updated by Adrian German for A348/A548