These lecture notes count as both sets 3 and 5.

CSCI A113
Lecture Notes Six

First semester 2000-2001


Probability, Statistics, Correlation and Linear Regression.
In class today we are going to finish chapter 4 on correlation.

The main focus will be understanding the formulas for:

  1. range

  2. variance

  3. standard deviation

  4. covariance

  5. correlation coefficient

We have made it clear why we study the coefficient of correlation, in terms of regression.

We will review these concepts in anticipation of the exams next week.

Review notes will be posted on Monday before the review session in Swain East 140 (7pm-9pm).

Here are some exercises that you might see on the next assignment (number 3).

1. Given a set of numbers:

1, 3, 2, 7, 6, 6, 3, 4, 9, 3, 4, 0, 2, 1, 1, 1, 4, 3, 8, 8, 2, 4
Calculate (using Excel) the following measures:
  1. the number of numbers (using COUNT)

  2. the sum of all numbers (using SUM)

  3. the mean (using SUM and COUNT)

  4. the variance (using any of the values calculated above)

  5. the standard deviation (using the variance calculated above)

  6. the range (using MIN and MAX)

Knowing the algebraic relationship between two variables enables predictions for one of the variables based on values of the other.

2. A Wall Street Journal (June 16, 1987) article provided the following data on outstanding credit card balances and interest rates charged at 10 selected banks.

Bank Outstanding balances
(in billions)
Most common
interest rate
1 9.10 19.8
2 5.30 19.8
3 4.50 17.5
4 3.30 19.8
5 2.50 17.8
6 2.00 17.9
7 1.94 20.0
8 1.27 19.8
9 1.20 19.8
10 0.99 17.7
The article suggested that banks with smaller outstanding credit card balances charge lower interest rates. (In other words the table above can be summarized by this statement). Is this true?

Analyze the data:

  1. Calculate the means for the two variables: Outstanding Balance (x) and Interest Rate (y).

  2. Calculate the deviations from the mean for each of the two variables (for covariance).

  3. Also calculate the squares of these deviations (you will need them for the standard deviation).

  4. Plot the variables and the deviations from the mean.

  5. Calculate the covariance of the two variables (mean of the sum of products of deviations).

  6. Calculate standard deviations for x and y (square root of mean of sum of squared deviations).

  7. Calculate the coefficient of correlation (divide covariance by product of standard deviations).

Remember that:


Last updated: November 9, 2000 by Adrian German for A113