|
Fall Semester 2002 |
A die (plural: dice) is a small cube marked on each face with from one to six spots and used usually in pairs in various games and in gambling by being shaken and thrown to come to rest at random on a flat surface. Let's visualize that, if you will:
When we toss a die in an experiment, we get one of the following possible outcomes:
E = { 1, 2, 3, 4, 5, 6 }To be more general we could denote this set of outcomes as follows:
E = { e1, e2, e3, e4, e5, e6 }To be even more general we could abstract the number of events as well:
E = { e1, e2, ..., em }and keep in mind that for a single die m = 6 and ei = i.
Note that the experiment involves throwing just one die.
The collection of all these events is called the event space for the experiment.
Describe the event space when we throw two dice (it's a set of ordered pairs).
Describe the event space when we throw two dice and are interested in the sum of their points.
If the die is unbiased, when we perform an experiment, the likelihood that we get one of the six events is the same for each event.
Assume that we do n > 0 experiments.
Let f (ei) be the frequency of occurrence of event ei in the n experiments.
The experimental probability of event ei in n experiments, is defined as
It should be immediate that:
Prove it. The first part of the lab tomorrow will be probability related.
Next, let's review some of the things we did last time.
2. Central Statistics
Let E be a number-based event space and let
X = (x1, x2, ..., xn)be a list of events from E that originate from n experiments.
The mean (also called average or expected value) of X is defined as:
If we denote
Prove it. This expression links the theory of probability to the statistical notion of mean.
The mean is a statistic of so called central tendency.
In fact, one can prove that:
Prove it. If we take the sum over the differences between each one of the observed events and the mean we obtain a value of 0 (zero). The differences cancel each other out. Hence the mean is the most central value (from this point of view, almost a center of mass) that characterizes the observed events.
Let's summarize here the properties of the mean:
This last property is very important and is used in many areas of statistics,
particularly in regression. Elaborated a little more fully, this property states
that although the sum of the squared deviations about the mean does not usually
equal zero, it is smaller than if squared deviations were taken about any other
value.
You have tested this property in Lab Two or Three, and in class last Thursday.
There are other measures of central tendency, such as the median and the mode.
Let's review them.
They both rely on the idea of sorting.
Let
X = (x1, x2, ..., xn)be a list of n experiments.
Since the xj's are numbers we can sort the list X in ascending (increasing) order.
Assume that the result is the list
(y1, y2, ..., yn)The median of X is defined as:
The median has the following property:
The third and last measure of central tendency that we shall discuss is the mode.
The mode is defined as the most frequent score in the distribution. (When all
scores in the distribution have the same frequency, it is customary to say that the distribution has no mode).
Clearly, this is the easiest of the three measures to determine. The mode is found by inspection of the scores; there isn't any calculation necessary.
Usually distributions are unimodal; that is, they have only one mode. However, it is possible for a distribution to have many modes. When a distribution has two modes, the distribution is called biimodal. In general, with more than two modes a distribution is called multimodal.
Measures of Central Tendency and Symmetry.
If the distribution is unimodal and symmetrical, then the mean, median and mode will all be equal. When the distribution is skewed, the mean and median will not be equal. Since the mean is most affected by extreme scores, it will have a value closer to the extreme scores than will the median. Thus, with a negatively skewed distribution, the mean will be lower than the median. With a positively skewed curve, the mean will be larger than the median.
Here's a picture that illustrates the point:
3. Statistics of Dispersion
Measures of Variability
Variability has to do with how far the scores (or values obtained, measured in the experiments) are spread apart. Whereas measures of central tendency are a quantification of the average value of the distribution, measures of variability quantify the extent of dispersion. There are three measures of variability we will be looking at:
The range is defined as the difference between the highest and lowest score in the distribution. The range is easy to calculate but gives us only a relatively crude measure of dispersion, because the range really only measures the spread of the two extreme scores and not the spread of any of the scores in between. (By scores we mean, as usual, observed events, or measurements).
Let
X = (x1, ..., xn)be a list of n experiments.
The variance of the list X is defined as follows:
Alternatively, the variance of X is defined as the mean of the list
that is,
As with the mean, we can define the variance of X in terms of the experimental probabilities of the events:
The variance is not used much in descriptive statistics because it gives us squared units of measurement. It is used, however, quite frequently in inferential statistics.
The standard deviation of a list of experiments X is defined as follows:
The standard deviation is the most frequently encountered measure of variability.
The standard deviation has many important characteristics:
This differs from the range, which tells us directly the spread of the two most extreme scores.
If a score is moved closer to the mean, then the standard deviation will become smaller. Conversely, if a score shifts away from the mean, then the standard deviation will increase.
If samples were taken repeatedly from populations of the type usually encountered in the behavioral sciences, the standard deviation of the samples would vary much less from sample to sample than the range.
4. The Normal Curve
The normal curve is a theoretical distribution of population scores. It is a bell shaped curve which, like most other curves, has an equation that describes it
We're going to look at this curve in lab tomorrow, and in lab and lecture all week next week.
5. The Normal Curve and Standard Scores
5.1 The Normal Distribution
For many variables, most observations are concentrated near the middle of the distribution. As distance from the central concentration increases, the frequency of observation decreases. Such distributions are called "bell-shaped". An example is the normal distribution.A broad range of observed phenomena in nature and in society is approximately normally distributed. For example, the distributions of variables such as
Take a look at homework two now.
5.2 Shape of the Normal Distribution
This is only an approximate rendering, still meaningful.
5.3 Area Contained Under The Normal Curve
Appears indicated in the diagram above.
Please check these numbers in lab.
5.4 Standard Scores (z-Scores)
A z score is a transformed score that designates how many standard deviation units the corresponding raw score is above or below the mean.
This transformation results in a distribution having a mean of 0 (zero) and a standard deviation of 1 (one). Again, we will need to verify this in the lab.
Important use of z scores:
to compare scores that are not otherwise directly comparable
Take a look at and think about the new homework again now.
5.5 Characteristics of z-Scores
In lab we will also look at the following transformations:
A113