CSCI A114 / INFO I111
Lecture Notes Nine

First semester 2000-2001


Midterm Review.

We review the relational data model as a basis for data definition and data manipulation.

The Relational Data Model

The mathematical concept underlying the relational model is the set-theoretic relation, which is a subset of the Cartesian product of a list of domains.

A domain is simply a set of values. For example:

The Cartesian product of domains D1, D2, ..., Dk, written
D1 x D2 x ... x Dk
is the set of all k-tuples (v1, v2, ..., vk) such that v1 is in D1, v2 is in D2, and so on.

For example, if

D1 x D2 is
{(0, a), (0, b) (0, c), (1, a), (1, b), (1, c)}.
A relation is any subset of the Cartesian product of one or more domains.

As far as databases are concerned, it is pointless to discuss infinite relations, so we shall assume that a relation is finite unless we state otherwise.

For example,

{(0, a), (0, c), (1, b)}
is a relation, a subset of D1 x D2 defined above.

The empty set is another example of a relation.

The members of a relation are called tuples.

Each relation that is a subset of

D1 x D2 x ... x Dk
is said to have arity k.

A tuple

(v1, v2, ..., vk)
has k components; the ith component is vi.

Often we drop the parens when we denote a tuple.

It helps to view a relation as a table, where each row is a tuple and each column corresponds to one component. The columns are often given names, called attributes.

Example: Here's a relation whose attributes are CITY. STATE, and POP.

The arity of the relation is three.

For example:

(Miami, Oklahoma, 13880)
is a tuple. Here's the entire relation:

CITY STATE POP
San Diego Texas 4490
Miami Oklahoma 13880
Pittsburg Iowa 509
An Alternative Formulation of Relations

If we attach attribute names to columns of a relation, then the order of the columns becomes unimportant. I mathematical terms we view tuples as mappings from attributes' names to values in the domains of the attributes. This change in viewpoint makes certain relations equal that were not equal under the more traditional definition of a relation.

Example: two versions of the same relation in the set-of-mappings point of view below.

CITY STATE POP
San Diego Texas 4490
Miami Oklahoma 13880
Pittsburg Iowa 509
STATE POP CITY
Texas 4490 San Diego
Oklahoma 13880 Miami
Iowa 509 Pittsburg

For example, as a mapping f, the tuple

(Buffalo, W. Va, 831)
is defined by

Note that the order in which the tuples are listed makes no difference in either viewpoint. However, in the traditional view of a tuple as a list of values, the tuples

(Buffalo, W. Va, 831)
and
(W. Va, 831, Buffalo)
would not be the same, and the two relations above would not be considered the same.

As existing relational systems allow the printing of columns of a relation in any order, we shall take the set-of-mappings definition of relations as the standard one. However, there are situations, such as when we discuss relational algebra later, where we shall want to use the set-of-lists definition for relations. Fortunately, there is an obvious method of converting between the two viewpoints. Given a relation in the set-of-lists sense, we can give arbitrary attribute names to its columns, whereupon it can be viewed as a set of mappings. Conversely, given a relation in the set-of-mappings sense, we can fix an order for the attributes and convert it to a set of lists.

Relation Schemes

The list of attribute names for a relation is called the relation scheme

Representing Data in the Relational Model

The collection of relation schemes used to represent information is called a (relationl) database scheme, and the current values of the corresponding relations is called the (relational) database.

We are, of course, free to create relations with any set of attributes for a relation scheme, and we can place any interpretation we wish on tuples. However, we can observe the typical pattern of usage if we recall one discussion of the entity-relationship model. The data of an entity-relationship diagram is represented by two sorts of relations:

  1. An entity set can be represented by a relation whose relation scheme consists of all the attributed of the entity set. If this entity set is one whose entities are identified by a relationship with some other entity set, then the relation scheme also has the attributes in the key for the second entity set, but not its non-key attributes. Each tuple in the relation represents one entity in the entity set.

  2. A relationship among entity sets
    E1, E2, ..., Ek
    is represented by a relation whose relation scheme consists of the attributes in the keys for each of
    E1, E2, ..., Ek
    We assume, by renaming attributes if necessary, that no two entity sets have attributes with the same name. A tuple t in this relation denotes a list of entities
    e1, e2, ..., ek
    where ei is a member of set Ei for each i. That is, ei is the unique entity in Ei whose attribute values for the key attributes of Ei are found in the components of tuple t for these attributes. The presence of tuple t in the relation indicates that the entities
    e1, e2, ..., ek
    are related by the relationship in question.

Example: Let us explore a database that records baseball players, the teams they played for, their batting averages and positions played. Before showing how data of this nature can be represented as relations, let us consider the entity-relationship diagram that represents the "real world" as it pertains to this example.

The entities are:

  1. PLAYERS with attributes:
    NAME
    HOME   /* place of birth */ 
    BDATE  /* date of birth */ 
    Attribute NAME is the key.

  2. POSITIONS with attributes:
    POSNAME    /* e.g., pitcher */
    POSNUMBER  /* 1 for pitcher, 2 for catcher ... */ 
    We'll take POSNAME to be the key.

  3. TEAMS with attributes:
    FRANCHISE /* explained below */ 
    CITY 
    YEAR 
    The FRANCHISE attribute is a unique identifier we give to a baseball franchise. As a franchise is a property, it has existence as an entity even when it moves to another city or changes its name (e.g., the Cincinnati "Reds" have been called "Redlegs" for two different periods of their history). In all example cases, we use the current name of the franchise as the value of FRANCHISE. A team (as opposed to a franchise) is the collection of players, coaches, etc., working for a franchise in a given year. The key for the entity set TEAM is FRANCHISE and YEAR.

  4. BA, the set of batting averages.

    This entity set has one attribute, PCT, whose values are three digit decimal numbers between 0 and 1. Clearly, PCT is the key.

We should also discuss the following relationships:

  1. The relationship SEASON between PLAYERS, TEAMS, and BA. Notice that this is a ternary relationship (arity 3). It is many-one from PLAYERS and TEAMS to BA in the sense that given a player and a team that player has a unique batting average. Recall that a "team" exists for one particular year, so the typical player is on many teams and has many batting averages, but just one batting average per year.

  2. The relationship PLAYS between PLAYERS and POSITIONS. This is a many-many relationship indicating what positions were played by a player, over the course of his career.

Implementing a Relational Database

Draw the entity-relationship diagram for the problem outlined above.

Then select relation schemes to represent the entity sets and their relationships.

Operations on Relational Databases

The basic operations are:

  1. insert
  2. delete
  3. modify, and
  4. lookup

Interogation of a Relational Database

The first three add, delete or change the contents of the data.

The last one presents additional views to it, without making changes to the data.

Suggested Exercises

Think of the following queries:

The process of following logically connected data from relation to relation in order to obtain desired information is called navigation. Implementors are concerned with this, and the query language that we are using should provide it transparently.

Other exercises: here's a E-R diagram.

List the design of all the tables in the database that models this diagram.

Data Manipulation Languages for the Relational Model

Query languages for the relational model break down into two broad classes:

  1. Algebraic languages, where queries are expressed by applying specialized operators to relations, and

  2. Predicate calculus languages, where queries describe a desired set of tuples by specifying a predicate the tuples must satisfy.

These abstract query languages are not implemented exactly as stated in any existing DBMS, but they serve as a benchmark for evaluating existing systems.

Relational Algebra

There are five basic operations that serve to define relational algebra.

  1. Union

  2. Set difference

  3. Cartesian product

  4. Projection

  5. Selection

Some Additional Algebraic Operations

  1. Intersection

  2. Quotient

  3. Join

  4. Natural join

Suggested Exercises

To be posted here tonight.


Last updated: September 26, 2000 by Adrian German for A114/I111