CSCI A114 / INFO I111
Lecture Notes Five

First semester 2000-2001


Database Design and Implementation (II).

Last time we reviewed what a relational database management system is, and we underlined the fact that everything revolves around tables. We will see that this perspective can be quite effective, in the exercises that we will undertake today.

Now we want to look at a few more queries from the textbook, some from the assignment, and also make a presentation on the lab notes for today and tomorrow, posted last night. Recall that a database is like an electronic file cabinet, in which you can create new files, or delete files you no longer need, and you can insert, query, update and delete the data that you store in the files. In our relational data model perspective files are tables.

Here are a few more examples that we will discuss briefly:

1. List the names of the teams in the basketball league.

2. What are the names of the teams founded in 1996?

3. What are the names of the teams in Illinois or Indiana?

4. What are the names of the teams in California founded before 1980?

5. What are the names of the teams in Indiana or California founded before 1980?

6. What are the names of the teams in the Western Conference?

7. What are the names and cities of the teams that are in the Western Conference or in the Eastern Conference and founded before 1985?

8. What are the values for all fields for teams in California?

9. What are the IDs of players who have scored more than 30 points in a game during the 1997/98 season?

Just like in ther examples above we need to assume a certain structure for the table. In the book (see page 1-4) each game for a player is listed as a record in the table, so if we have this data the query is very easy. The next one, however, is a bit trickier.
10. What are the IDs of players who have scored more than one point per minute during any game played during the 1997/98 season?
The table structure being the same we need to assume that if a player does not participate in a game he is not listed in the table, so we can't have a value of 0 (zero) for the number of minutes. This way we can create a query with a calculated field that contains the result of dividing the number of points by the number of minutes, and select only those tuples (rows, records) that have a value bigger than 1 in this field.
11. What are the IDs and points per minute for players who have scored more than 1 point per minute and either had 6 fouls or no fouls in a game played during the 1997/98 season?
We have two ways of doing this, and the second way is the novel way.
12. What are the names of players who did not attend college?

The key here is to look for records that have a value of NULL in that field.
13. How many teams are in the database?
Use COUNT.
14. How many technical fouls were called in 1997/98?
Use SUM.
15. What are the average salaries for the players on the Chicago Bulls in 1997?
Use a SELECT, followed by a PROJECT, and AVG.
16. What was the highest scoring game for the 1997/98 season?
Use MAX on a calculated field (visitors + host team # of points)
17. What are the IDs of players who have scored more than 1 point per minute during any game in the 1997/98 season?
Ideally we want to see each ID once and only once. Use DISTINCT.
18. How many players scored more than 20 points during any game played during the 1997/98 season?
SELECT those players, use DISTINCT, COUNT the resulting records.
19. In how many games at least one player scored at least 20 points in the 1997/98 season?

This only differs in the fact we're looking for games rather than player IDs.
20. List the teams in the league sorted in alphabetical order.
Sorting alphabetically is also know as ascending lexicographic order.
21. List the scoring averages and player IDs for 1997/98 season in descending order of scoring averages.

Easy.
Now the queries from the assignment that were a bit more difficult.

Eleven was wasy. I hope everybody agrees to that.

But Ten was not that easy.

Eleven lists the average hourly rate for a worker.

Ten needs to show the names of those workers that have an hourly rate bigger than what Eleven returns. We could run Eleven by hand, write down the average hourly rate, and then code the query (Ten) as a SELECT on those tuples where hourly rate is bigger than the result returned by Eleven.

To be able to achieve that we only need to add Eleven as the input (along with the Worker table) to Ten. There's nothing unusual with this, operations start from tables and produce tables. So in this case Eleven starts from Worker (which is a plain table) and produces a table that is dynamically generated and called Eleven. This new table (of one row) and the original Worker table is the input for the Ten query.

This produces a table that contains only those workers whose hourly rates are bigger than the average hourly rate that Eleven contains.

Eight was easy, but Nine was more involved. Having Eight we can write Nine, if we could get our hands on the average balance in the database. So we could create a NineHelper that produces the desired average, and starting from Eight and NineHelper we could produce Nine.

Always remember to look at the SQL when you're done with a query.

Notice that the last example also involves a JOIN.

Quick Review of the Lab

The lab focuses on data management. We add, change, and delete records in the BMS database.

Searching for a record resembles the same process in a word processor. Filtering can restrict the output to only those desired tuples. This works almost like a SELECT.

Tables' structure also need to be updated once in a while: fields can be dropped, or added, or their properties can change.

Update queries can be used to update data in a table.

Delete queries can be used to delete records in a table.

To help maintain consistency a variety of tools are available:

Elements of Database Modeling

A Model of the Real World

Let us introduce an informal model called the entity-relationship model of data. This model is not a data model that has been used in data definition languages, although it is closely related to some of these models. Rather the entity-relationship model will serve to justify the kinds of data structures and data models we introduce later, since the ability of these structures and models to represent entity-relationship structures will be apparent.

It will also be intuitively clear that the entity-relationship model does an adequate, albeit imperfect, job of modeling real-world situations, such as business enterprises or the records kept by schools, hospitals, governments, and so on, where database systems are used. However, if we view the structures defined in the entity-relationship model as conceptual schemes, we shall not be grossly deceived.

Entities

The term "entity" defines an all-inclusive definition. Suffice it to say that an entity is a thing that exists and is distinguishable; that is, we can tell one entity from another. For example, each chair is an entity. So is each person and each automobile. We could regard each ant as an entity if we had a way to distinguish one from another; otherwise we would not regard an ant as an entity. Higher-level concepts can be entities too. For example, in a biological database, terms like Arachnid, Rodent, Baboon, and Plant would be entities. If we stretch a point, concepts like love and hate are entities.

A group of all similar entities form an entity set. Examples of entity sets are:

  1. all flowers
  2. all blue flowers
  3. all automobiles
  4. all emotions

Notice from the first two examples that the term "similar entities" is not precisely defined, and one can establish an infinite number of different properties by which to define an entity set. One of the key steps in selecting a model for the real world, as it pertains to a particular database, is choosing the entity sets.

In our examples we have the following entities:

  1. Marinas and Technicians (in one case),
  2. Customers and Workers (in the other).

Attributes and Keys

Entities have properties, called attributes which associate a value from a domain of values for that attribute with each entity in an entity set. Usually, the domain for an attribute will be a set of integers, real numbers, or character strings, but we do not rule out other types of values. For example, the entities in the entity set of persons may be said to have attributes such as name (a character string), height (a real number), and so on.

The selection of relevant attributes for entity sets is another critical step in the design of a real-world model. An attribute or set of attributes whose values uniquely identify each entity in an entity set is called a key for that entity set. In principle, each entity set has a key, since we hypothesized that each entity is distinguishable from all others. But if we do not choose, for an entity set, a collection of attributes that include a key, then we shall not be able to distinguish one entity in the set from another. Often an arbitrary serial number is supplied as an attribute to serve as a key. For example, an entity set that included only U.S. nationals could use the single attribute "Social Security Number" as a key.

In our cases we have used Marina Number, Tech Number, fields and such.

Relationships

A relationship among entity sets is simply an ordered list of entity sets.

If a certain relationship REL exists among entity sets E1, E2, ..., Ek, then it is presumed that a set of k-tuples named REL exists. We call such a set a relationship set. The most common case, by far, is where k = 2, but lists of three or more entity sets are sometimes related.

In our examples we have Technician entities servicing Marina entities, and Worker entities also servicing Customer entities.

Functionality

To implement a database efficiently, it is often necessary to classify relationships according to how many entities from one entity set can be associated with how many entities of another entity set. The simplest and rarest form of relationship on two sets is one-to-one, meaning that for each entity in either set there is at most one associated member of the other set.

An example of such a situation might occur in the database of a business, where two entity sets EMPLOYEES and DEPARTMENTS exist. The relationship HEAD_OF between these two entity sets, indicating the department head of each department, might be assumed to be a one-to-one relationship. Note that the one-to-oneness of this relationship is an assumption about the real world that the database designer could choose to make or make not. It is just as possible, in fact even more plausible, to assume that the same person could head two departments, or even that a department could have two heads.

However, if one head for one department is the rule in this organization, then it may be possible to take advantage of the fact that HEAD_OF is one-to-one, when designing the physical database. Also, observe that a one-to-one relationship does not imply that for every entity of one set there actually exists a related entity of the other set. For example, certainly most employees are not head of any department, and there may be departments that at a given time have no head.

More common is the many-one relationship, where one entity in set E2 is associated with zero or more entities in set E1, but each entity in E1 associated with at most one entity in E2. This relationship is said to be many-one from E1 to E2. That is, the relationship is a partial function fro E1 to E2. For example, if each course is taught by one teacher, there is a many-one relationship TAUGHT_BY from entity set COURSES to entity set TEACHERS.

Also common is the many-many relationship, where there are no restrictions on the sets of pairs of entities that may appear in a relationship set. An example of such a relationship (many-many) is EXPORTS between entity sets COUNTRIES and PRODUCTS, since a country usually exports more than one product, and few products are exported by only one country.

We have a one-to-many relationship SERVICES from the Technician table (entity set) to the Marina table.

Entity-Relationship Diagrams

It is useful to summarize the information in a design using entity-relationship diagrams, where:

  1. Rectangles represent entity sets

  2. Circles represent attributes. They are linked to their entity sets by undirected edges.

  3. Diamonds represent relationships. They are linked to their constituent entity sets by undirected edges.

Here's the entity-relationship diagram for the Bavant Marine Services database:


Last updated: September 12, 2000 by Adrian German for A114/I111