Fall Semester 2002


Lecture Notes One: Getting started.
Searching for truth is an on-going struggle.

Historically four methods have been employed to acquire knowledge and thus settle our questions and bring us closer to what is true and what is not. We summarize these methods below:

  1. AUTHORITY

    When using this method we believe something is true because an authority says so. For example physicist claim there are electrons, and I believe them, although I haven't seen any (electron) myself. Likewise, the Surgeon General says smoking is bad for your health, and I believe him (although that's not the real reason I don't smoke).

  2. RATIONALISM

    The method of rationalism uses reasoning alone to arrive at knowledge. This is the way to go, but reason alone can't always take you all the way. In most cases you need some evidence (data) as well. So reasoning is only part of the process, and not synonymous with it.

  3. INTUITION

    Where do all the (crazy) ideas come from? By intuition we mean sudden insight. This, however, is a very mysterious process, about which we have only the most rudimentary understanding.

  4. SCIENTIFIC METHOD

    This method uses reasoning and intuition, but relies on objective assessment. By rationalism and intuition a scientist forms a hypothesis about some fact, or some reality. An experiment is then designed, resulting in measurements. The data from the experiment is analyzed, the hypothesis is either supported or rejected.

This will be a course in data analysis.

There are seven parts to this class, fourteen lectures, about as many lab assignments, five homework assignments, one midterm exam, two practical exams, and a final written exam. To learn the material described in the notes you don't need to be a whiz in calculus or differential equations. To be successful you must be able to do elementary algebra and a few other mathematical operations. To help you review, the lab notes for tomorrow will cover the prerequisite mathematics for this class. Most (if not all) of the material should be pretty basic, but please review it. As they say, it's better to be sure than sorry.
Where do all the numbers come from?

Scientific research may be divided into two categories:

A. OBSERVATIONAL STUDIES

In these studies all you can do is take notes. Included in this category of research are:

  1. Naturalistic Observation Much anthropological and etiological research is of this type. In this research the main goal is to find out what's going on (that is, to obtain an accurate description of the situation that is being studied).

  2. Parameter Estimation This kind of research is conducted on samples to estimate the level of one or more population characteristics. Surveys, public opinion polls, and much market research falls into this category.

  3. Correlational Studies In these studies the investigator focuses attention on two or more variables to determine whether they are related.

B. TRUE EXPERIMENTS

In this type of research an attempt is made to determine if changes in one variable produce changes in another variable(s). In this case you have the freedom to make changes and observe results.

Let's make a summary: we keep searching for truth. We need knowledge, as much as we can acquire. Rationalism, intuition, authority and scientific experiments are our tools. Of paramount importance is the data (evidence) that we collect. We do that through a process of measurement.

Let's now look at measurement scales.

NOMINAL SCALES

This is the lowest level of measurement, used with variables that are qualitative in nature. Objects are measured by the category that they belong to. On a used car lot we have all the Mazdas, Toyotas, Chevys, and so forth. Or, we could sort the cars by the type of car they are: small sedans, family sedans, SUVs, minivans, trucks, and so forth. Or we can sort them by the year they were produced in.

There's no direct relationship between categories.

ORDINAL SCALES

This is the next higher level of measurement. On such a scale we could say that Michael Jordan was a better basketball player than Rik Smits, and Rik Smits was a better basketball player than your lab instructor. Chances are that the difference between MJ and Rik Smits is not as big as between Rik Smits and your lab instructor, but on an ordinal scale, this does not matter.

For another example the Sears Tower in Chicago is taller than the Empire State Building in NY, and the Empire State Building is taller than Ballantine. An ordinal scale only cares about who's taller, but not by how much.

INTERVAL SCALES and RATIO SCALES

The Celsius and Fahrenheit scales of temperature are interval scales. On such a scale we would be able to say that a temperature of 93F is greater than one of 91F, but the difference is not as big as when we compare a temperature of 91F to one of 80F. Same goes for Celsius.

A ratio scale is one that has an absolute zero point. The Kelvin scale of temperature is such a scale. As a consequence a temperature of 200K is twice as hot as a temperature of 100K. The Celsius and Fahrenheit scales have their zeros in various places and are not absolute in any way (the Celsius scale is ideal for cooking, while the Fahrenheit scale is mostly oriented towards human body temperatures and weather temperatures).

The Kelvin scale, though, is an absolute (ratio) scale of measurement.

Before we go on let's present a brief week by week schedule for this class.

Week 1
Introduction to scientific data analysis
Excel and data analysis

Week 2
Measures of Central Tendency and Variability.
Data Distributions (I)

Week 3
Data Distributions (II)
Measurement, error and uncertainty

Week 4
Midterm Review. MIDTERM EXAM

Week 5
THANKSGIVING WEEK

Week 6
Linear Regression. Correlation.

Week 7
Tests of significance.

Week 8
Final Review FINAL EXAM

Data analysis (or statistical analysis) has been divided into two areas:

Both involve analyzing data. If the analysis is done for the purpose of describing or characterizing the data that have been collected, then we are in the area of descriptive statistics. For example, when we record the scores from an exam, such as the one we talked about last time, we hand the tests back and then we want to describe the scores. We might decide to

  1. calculate the average of the distribution so as to describe its central tendency.

  2. determine its range, so as to characterize its variability.

  3. plot the scores on a graph (histogram) so as to show the shape of the distribution.

Since all of these procedures are for the purpose of describing or characterizing the data already collected, they fall within the realm of descriptive statistics. Inferential statistics, on the other hand, is not concerned with just describing the obtained data. Rather, it embraces techniques that allow one to use obtained sample data to infer to or draw conclusions about populations.

Descriptive Statistics
is concerned with techniques that are used to describe or characterize data.

Inferential Statistics
involves techniques that use the obtained sample data to infer to populations.

Let's now look at

FREQUENCY DISTRIBUTIONS

Let's suppose you have just been handed back your first exam. You received an 86. Naturally, you are interested in how well you did relative to the other students.

Here are the raw scores. How well did you do?

Fortunately we can start analyzing this with Excel.

We look today at the following components of the Analysis Tool:

  1. Histogram
  2. Rank and Percentile
Here are two of the things we will be looking at.

Percentile Rank
The percentile rank of a score is the percentage of scores lower than the score in question.

Histogram
The histogram is used to represent frequency distributions composed of interval or ratio data.
After we look at the data with Excel we will also define

  1. The Mode
  2. The Median
  3. The Mean

In our discussions this semester we shall be using certain technical terms.

The terms and their definitions will now be given.

Here are the definitions:

  1. the mean

  2. the median

    The median is the scale value below which 50% of the scores fall.

    It is therefore the same thing as the percentile point for 50% (P50).

  3. the mode

    The mode is the most frequent score in the distribution.

    Homework One, that will be posted today (due on Friday in lab) will help you clearly distinguish the relative merits of each of these three measures of central tendency.


    Last updated: Oct 29, 2002 by Adrian German for A113