|
CSCI A114 / INFO I111
|
We review the relational data model as a basis for data definition and data manipulation.
The Relational Data Model
The mathematical concept underlying the relational model is the set-theoretic relation, which is a subset of the Cartesian product of a list of domains.
A domain is simply a set of values. For example:
D1 x D2 x ... x Dkis the set of all k-tuples (v1, v2, ..., vk) such that v1 is in D1, v2 is in D2, and so on.
For example, if
{(0, a), (0, b) (0, c), (1, a), (1, b), (1, c)}.A relation is any subset of the Cartesian product of one or more domains.
As far as databases are concerned, it is pointless to discuss infinite relations, so we shall assume that a relation is finite unless we state otherwise.
For example,
{(0, a), (0, c), (1, b)}is a relation, a subset of D1 x D2 defined above.
The empty set is another example of a relation.
The members of a relation are called tuples.
Each relation that is a subset of
D1 x D2 x ... x Dkis said to have arity k.
A tuple
(v1, v2, ..., vk)has k components; the ith component is vi.
Often we drop the parens when we denote a tuple.
It helps to view a relation as a table, where each row is a tuple and each column corresponds to one component. The columns are often given names, called attributes.
Example: Here's a relation whose attributes are CITY. STATE, and POP.
The arity of the relation is three.
For example:
(Miami, Oklahoma, 13880)is a tuple. Here's the entire relation:
An Alternative Formulation of Relations
CITY STATE POP San Diego Texas 4490 Miami Oklahoma 13880 Pittsburg Iowa 509
If we attach attribute names to columns of a relation, then the order of the columns becomes unimportant. I mathematical terms we view tuples as mappings from attributes' names to values in the domains of the attributes. This change in viewpoint makes certain relations equal that were not equal under the more traditional definition of a relation.
Example: two versions of the same relation in the set-of-mappings point of view below.
|
|
For example, as a mapping f, the tuple
(Buffalo, W. Va, 831)is defined by
Note that the order in which the tuples are listed makes no difference in either viewpoint. However, in the traditional view of a tuple as a list of values, the tuples
(Buffalo, W. Va, 831)and
(W. Va, 831, Buffalo)would not be the same, and the two relations above would not be considered the same.
As existing relational systems allow the printing of columns of a relation in any order, we shall take the set-of-mappings definition of relations as the standard one. However, there are situations, such as when we discuss relational algebra later, where we shall want to use the set-of-lists definition for relations. Fortunately, there is an obvious method of converting between the two viewpoints. Given a relation in the set-of-lists sense, we can give arbitrary attribute names to its columns, whereupon it can be viewed as a set of mappings. Conversely, given a relation in the set-of-mappings sense, we can fix an order for the attributes and convert it to a set of lists.
Relation Schemes
The list of attribute names for a relation is called the relation scheme
Representing Data in the Relational Model
The collection of relation schemes used to represent information is called a (relationl) database scheme, and the current values of the corresponding relations is called the (relational) database.
We are, of course, free to create relations with any set of attributes for a relation scheme, and we can place any interpretation we wish on tuples. However, we can observe the typical pattern of usage if we recall one discussion of the entity-relationship model. The data of an entity-relationship diagram is represented by two sorts of relations:
E1, E2, ..., Ekis represented by a relation whose relation scheme consists of the attributes in the keys for each of
E1, E2, ..., EkWe assume, by renaming attributes if necessary, that no two entity sets have attributes with the same name. A tuple t in this relation denotes a list of entities
e1, e2, ..., ekwhere ei is a member of set Ei for each i. That is, ei is the unique entity in Ei whose attribute values for the key attributes of Ei are found in the components of tuple t for these attributes. The presence of tuple t in the relation indicates that the entities
e1, e2, ..., ekare related by the relationship in question.
Example: Let us explore a database that records baseball players, the teams they played for, their batting averages and positions played. Before showing how data of this nature can be represented as relations, let us consider the entity-relationship diagram that represents the "real world" as it pertains to this example.
The entities are:
PLAYERS with attributes: AttributeNAME HOME /* place of birth */ BDATE /* date of birth */
NAME is the key.
POSITIONS with attributes: We'll takePOSNAME /* e.g., pitcher */ POSNUMBER /* 1 for pitcher, 2 for catcher ... */
POSNAME to be the key.
TEAMS with attributes: TheFRANCHISE /* explained below */ CITY YEAR
FRANCHISE attribute is a unique identifier we give to a baseball franchise.
As a franchise is a property, it has existence as an entity even when it moves to another city
or changes its name (e.g., the Cincinnati "Reds" have been called "Redlegs" for two different
periods of their history). In all example cases, we use the current name of the franchise as
the value of FRANCHISE. A team (as opposed to a franchise) is the collection
of players, coaches, etc., working for a franchise in a given year. The key for the entity set
TEAM is FRANCHISE and YEAR.
BA, the set of batting averages.
This entity set has one attribute, PCT, whose values are three digit decimal numbers
between 0 and 1. Clearly, PCT is the key.
We should also discuss the following relationships:
SEASON between PLAYERS, TEAMS, and BA.
Notice that this is a ternary relationship (arity 3). It is many-one from PLAYERS and TEAMS
to BA in the sense that given a player and a team that player has a unique batting average. Recall that
a "team" exists for one particular year, so the typical player is on many teams and has many batting averages, but
just one batting average per year.
PLAYS between PLAYERS and POSITIONS. This is a many-many
relationship indicating what positions were played by a player, over the course of his career.
Implementing a Relational Database
Draw the entity-relationship diagram for the problem outlined above.
Then select relation schemes to represent the entity sets and their relationships.
Operations on Relational Databases
The basic operations are:
Interogation of a Relational Database
The first three add, delete or change the contents of the data.
The last one presents additional views to it, without making changes to the data.
Suggested Exercises
Think of the following queries:
Other exercises: here's a E-R diagram.
List the design of all the tables in the database that models this diagram.
Data Manipulation Languages for the Relational Model
Query languages for the relational model break down into two broad classes:
Relational Algebra
There are five basic operations that serve to define relational algebra.
Suggested Exercises
To be posted here tonight.
A114/I111