|
Fall Semester 2002 |
At this point I will trust that you have read Chapters 1 and 2 from Kirkup.
This week the reading assignment is Chapter 3 from Kirkup.
And now, let's start bringing in some definitions.
In our discussions this semester we shall be using certain technical terms.
The terms and their definitions will now be given.
whose values has been shown here to the 6th decimal.
To illustrate, an investigator might be interested in the effect of alcohol on social behaviour. In this example, the experimenter is manipulating the amount of alcohol consumed by the subjects and measure its effect on their social behaviour. Alcohol amount is the independent variable.
In another experiment, the effect of sleep deprivation on aggressive behaviour is studied. Subjects are deprived of various amounts of sleep, and the consequences on aggressiveness are observed. Here, the amount of sleep deprivation is manipulated. Hence, it is the independent variable.
For example, in the experiment studying the effects of alcohol on social behaviour, amount of alcohol is the independent variable. The social behaviour of the subjects is measured to see if it is affected by the amount of alcohol consumed.
In the investigation of sleep deprivation and aggressive behaviour, the amount of sleep deprivation is being manipulated, and the subjects' aggressive behaviour is being measured. Amount of sleep deprivation is the independent variable, and aggressive behaviour is the dependent variable.
In research data is often collected on a sample of subjects, rather than on the entire population to which the results are intended to apply. Ideally, of course, the experiment would be performed on the whole population, but usually it is far too costly (time, money, etc.), and so a sample is taken. Note that not just any sample will do (see the Wald example below).
The sample should be a random sample. Random sampling will be discussed later. For now, it is sufficient to know that random sampling allows the laws of probability to apply to the data and at the same time helps achieve a sample that is representative of the population. Thus, the results obtained from the sample should also apply to the population. Once the data is collected, it is statistically analyzed, and the appropriate conclusions are drawn.
And now, let's look at the announced example:
During World War II many economists, mathematicians, and statisticians were members of Columbia University's Statistics Research Group, which did high-level consulting work for the armed services.Well, we'll think about this, and another example.As part of this group's work, statistician Abraham Wald was asked where to place armor on planes. It seemed obvious to the aircraft engineers that armor was needed at the places most frequently hit, as found in a large sample of battle-proven airplanes. After studying the bullet holes of a sample of returning planes, Wald's conclusion was to place the armor where bullet holes were least frequently found in these planes, and that's what he recommended.
Now the questions:
- Was his reasoning justified?
- Was there anything wrong with the aircraft engineers' sampling design?
- Did they overlook anything?
ABC's 20/20 television broadcast on July 16, 1993 reported on a study in which individuals who had lived to be 100 years of age or more were queried in the hope of finding common characteristics. The implication was drawn that if a younger person worked at acquiring the characteristics shared by these centenarians, then the probability of reaching such an old age increased.Why was this study design inappropriate for the implication drawn?
Self-selected samples can be misleading.
Data analysis (or statistical analysis) has been divided into two areas:
Since all of these procedures are for the purpose of describing or characterizing the data already collected, they fall within the realm of descriptive statistics. Inferential statistics, on the other hand, is not concerned with just describing the obtained data. Rather, it embraces techniques that allow one to use obtained sample data to infer to or draw conclusions about populations.
FREQUENCY DISTRIBUTIONS
When grouping data, one of the important issues is how wide each interval should be. Whenever data is grouped, some information is lost. The wider the interval, the more information is lost. In practice one usually determines the interval width by dividing the distribution into from 10 to 20 intervals. Or, one could use more complicated formulas, and rules of a thumb, such as the integer that just exceeds
2 N 0.33where N is the total number of values in the data set.
Let's look at some more definitions.
PERCENTILES
A percentile or percentile point is the value on the measurement scale below which a specified percentage of the scores in the distribution fall. Percentiles are measures of relative standing. Thus, the 60th percentile point is the value on the measurement scale below which 60% of the scores in the distribution fall. Sometimes we are faced with the situation where we want to know the percentile rank of a raw score. For example, since your score on the exam was 86, it would be useful to you to know the percentile rank of 86.
The percentile rank of a score is the percentage of scores lower than the score in question. This situation is just the reverse of the one where we were calculating the percentile point.
GRAPHING FREQUENCY DISTRIBUTIONS
We have several tools:
Frequency distributions of nominal or ordinal data are customarily plotted as a bar graph (also as a pie chart). Since there is no numerical relationship between the categories in the nominal data, the various groups can be arranged along the horizontal axis in any order. The bars need not touch each other, as in the case of the histogram. This further emphasizes the lack of quantitative relationship between the categories.
The histogram is used to represent frequency distributions composed of interval or ratio data. It resembles the bar graph, except that with the histogram a bar is drawn for each class interval (or bin). The class intervals are plotted on the horizontal axis such that each class bar begins and terminates at the real limits of the interval. The height of the bar corresponds to the frequency of the class interval. Since the intervals are continuous, the vertical bars must touch each other, rather than being spaced apart as is done with the bar graph.
Like a histogram except a point is plotted over the midpoint of each interval at a height corresponding to the frequency of the interval. The points are then joined with straight lines.
Cumulative frequency and cumulative percentage distributions may also be presented in graphical form, the latter are more often encountered, used.
SHAPES OF FREQUENCY CURVES
The curve in the middle is symmetrical. The one on the left is negatively skewed, the one on the right is positively skewed. This will become clearer after we define the three measures of central tendency:
The median is the scale value below which 50% of the scores fall.
It is therefore the same thing as the percentile point for 50% (P50).
The mode is the most frequent score in the distribution.
Homework One, that was due last week,
tried to help you clearly distinguish the relative
merits of each of these three measures of central tendency. (Please try to match these notes with your reading assignments from Kirkup).
Homework Two is based on what will be discussed this week.
(And Homework Three will focus more on the same topic).
Here now are some answers from last year that should help you with the minute papers of today:
Last time we looked at some measures of central tendency.
Let's now take a look at MEASURES OF VARIABILITY
1. The Range.
The range is defined as the difference between the highest and lowest score in the distribution.
2. Deviation Scores.
A deviation score tells how far away the raw score is from the mean of its distribution.
3. The Standard Deviation.
For a population of scores we have:
For a sample we have:
Alternative formula for the standard deviation:
Properties of the standard deviation:
A113