These lecture notes count as both sets 3 and 5.
|
CSCI A113
Lecture Notes Six
First semester 2000-2001
|
Probability, Statistics, Correlation and Linear Regression.
In class today we are going to finish chapter 4 on correlation.
The main focus will be understanding the formulas for:
- range
- variance
- standard deviation
- covariance
- correlation coefficient
We have made it clear why we study the coefficient of correlation, in terms of regression.
We will review these concepts in anticipation of the exams next week.
Review notes will be posted on Monday before the review session in Swain East 140 (7pm-9pm).
Here are some exercises that you might see on the next assignment (number 3).
1. Given a set of numbers:
1, 3, 2, 7, 6, 6, 3, 4, 9, 3, 4, 0, 2, 1, 1, 1, 4, 3, 8, 8, 2, 4
Calculate (using Excel) the following measures:
- the number of numbers (using COUNT)
- the sum of all numbers (using SUM)
- the mean (using SUM and COUNT)
- the variance (using any of the values calculated above)
- the standard deviation (using the variance calculated above)
- the range (using MIN and MAX)
Knowing the algebraic relationship between two variables enables predictions for one
of the variables based on values of the other.
2. A Wall Street Journal (June 16, 1987) article
provided the following data on outstanding credit card balances and interest rates
charged at 10 selected banks.
| Bank | Outstanding balances (in billions) | Most common interest rate |
| 1 | 9.10 | 19.8 |
| 2 | 5.30 | 19.8 |
| 3 | 4.50 | 17.5 |
| 4 | 3.30 | 19.8 |
| 5 | 2.50 | 17.8 |
| 6 | 2.00 | 17.9 |
| 7 | 1.94 | 20.0 |
| 8 | 1.27 | 19.8 |
| 9 | 1.20 | 19.8 |
| 10 | 0.99 | 17.7 |
The article suggested that banks with smaller outstanding credit card balances
charge lower interest rates. (In other words the table above can be summarized
by this statement). Is this true?
Analyze the data:
- Calculate the means for the two variables: Outstanding Balance (x) and Interest Rate (y).
- Calculate the deviations from the mean for each of the two variables (for covariance).
- Also calculate the squares of these deviations (you will need them for the standard deviation).
- Plot the variables and the deviations from the mean.
- Calculate the covariance of the two variables (mean of the sum of products of deviations).
- Calculate standard deviations for x and y (square root of mean of sum of squared deviations).
- Calculate the coefficient of correlation (divide covariance by product of standard deviations).
Remember that:
- A correlation coefficient will be between -1 and 1.
- Values close to 1 indicate a strong positive correlation.
- Values close to -1 indicate a strong negative correlation.
- Values close to 0 indicate no correlation.
- Values around 0.5 indicate a weak positive correlation.
- Values around -0.5 indicate a weak negative correlation.
Last updated: November 9, 2000 by Adrian German for A113