Homework Seven. What do you think?

 

a.             The scatterplot is:

About 90% of the variation is explained by the regression on the New Mexico data nad 88.4% of the variation is explained by the regression on the United States data. The slopes of the two trend lines appear to be different. One problem with these trend lines (if extended out into the future) is that they will eventually cross the x-axis indicating a negative fatality rate–an impossible result.

b.             The regression statistics for the New Mexico data are:

Regression Statistics

 

 

 

 

 

Multiple R

0.952173153

 

 

 

 

 

R Square

0.906633714

 

 

 

 

 

Adjusted R Square

0.904176706

 

 

 

 

 

Standard Error

0.897559381

 

 

 

 

 

Observations

40

 

 

 

 

 

 

 

 

 

 

 

 

ANOVA

 

 

 

 

 

 

 

df

SS

MS

F

Significance F

 

Regression

1

297.270462

297.270462

368.9992

3.65574E-21

 

Residual

38

30.61328799

0.805612842

 

 

 

Total

39

327.88375

 

 

 

 

 

 

 

 

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

13.15384615

0.28924003

45.47726724

9.6E-35

12.5683103

13.73938

Year

-0.236163227

0.012294181

-19.20935085

3.66E-21

-0.261051495

-0.21127


The regression statistics for the United States data are:

Regression Statistics

 

 

 

 

 

Multiple R

0.940149084

 

 

 

 

 

R Square

0.883880299

 

 

 

 

 

Adjusted R Square

0.880824518

 

 

 

 

 

Standard Error

0.67990177

 

 

 

 

 

Observations

40

 

 

 

 

 

 

 

 

 

 

 

 

ANOVA

 

 

 

 

 

 

 

df

SS

MS

F

Significance F

 

Regression

1

133.7098762

133.7098762

289.2485

2.33234E-19

 

Residual

38

17.56612383

0.462266417

 

 

 

Total

39

151.276

 

 

 

 

 

 

 

 

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

8.736923077

0.219099497

39.87650913

1.28E-32

8.293379319

9.180466835

Year

-0.158386492

0.009312849

-17.0073078

2.33E-19

-0.17723937

-0.139533613

The residual plot for the New Mexico data is:


The residual plot for the United States data is:

There is some indication in these plots that the regression assumption that residuals should be independent has been violated. There appears to be some time factor at which in the residual plot.

c.              New Mexico: Durbin-Watson = 0.719. Runs = 11. Runs p-value = 0.001

United States: Durbin-Watson = 0.270 Runs = 5. Runs p-value < 0.0001

The Durbin-Watson statistics are close to 0 for the United States data and for both sets of data the p-value of the runs test is significant indicating fewer runs that would be expected. This would cause us to doubt that the assumption of indepence of residuals has been met.