Is there anyone that could help me with this R-related question? --- In R, look
ID: 3365439 • Letter: I
Question
Is there anyone that could help me with this R-related question?
---
In R, look at the cars data–i.e. the variable cars with two columns cars$speed and cars$dist–included in the standard distribution. Fit a 4th order polynomial regression of the form
Do any of the regressors have significant t-statistics? Does the regression have a significant F-statistic? Use the command step( model ) to run a backward variable selection to minimize AIC. Which terms remain in the model? What is the final AIC value? Now run a forward selection. Is the model resulting model the same as in the backwards case?
---
Where AIC is the Akaike Information Criterion. The R code for the model is just lm(distspeed+I(speed^2)+I(speed^3)+I(speed^4),data=cars) , and the cars data set is standard inside R. Can anyone help confirm the analysis? Thanks in advance.
(dist)-A + -A i (Speed ) 'Explanation / Answer
Please see the R code
data("cars")
cars
## fit the model
fit <- lm(dist~ speed+I(speed^2)+I(speed^3)+I(speed^4),data=cars)
## use the summary funtion to see the results
summary(fit)
The results are
summary(fit)
Call:
lm(formula = dist ~ speed + I(speed^2) + I(speed^3) + I(speed^4),
data = cars)
Residuals:
Min 1Q Median 3Q Max
-23.701 -8.766 -2.861 7.158 42.186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 45.845412 60.849115 0.753 0.455
speed -18.962244 22.296088 -0.850 0.400
I(speed^2) 2.892190 2.719103 1.064 0.293
I(speed^3) -0.151951 0.134225 -1.132 0.264
I(speed^4) 0.002799 0.002308 1.213 0.232
as none of the p value is less than 0.05 , hence no variable is statistically signficant
Residual standard error: 15.13 on 45 degrees of freedom
Multiple R-squared: 0.6835, Adjusted R-squared: 0.6554
F-statistic: 24.3 on 4 and 45 DF, p-value: 9.375e-11 , yes this is the p value for the f stat , as the p value is less than 0.05 , hence we can say that the model as a whole is statistically significant . This is a typical case of multicollinearity problem , where none of the variables are signficant but model as a whole is signficant
## perform the backward stepwise regression
step(fit,direction="backward")
the result is
Start: AIC=276.38
dist ~ speed + I(speed^2) + I(speed^3) + I(speed^4)
Df Sum of Sq RSS AIC
- speed 1 165.52 10463 275.18
- I(speed^2) 1 258.90 10557 275.62
- I(speed^3) 1 293.28 10591 275.79
- I(speed^4) 1 336.55 10634 275.99
<none> 10298 276.38
Step: AIC=275.18
dist ~ I(speed^2) + I(speed^3) + I(speed^4)
Df Sum of Sq RSS AIC
- I(speed^4) 1 402.20 10866 275.07
- I(speed^3) 1 407.78 10871 275.09
<none> 10463 275.18
- I(speed^2) 1 650.31 11114 276.19
Step: AIC=275.07
dist ~ I(speed^2) + I(speed^3)
Df Sum of Sq RSS AIC
- I(speed^3) 1 5.60 10871 273.09
<none> 10866 275.07
- I(speed^2) 1 609.51 11475 275.80
Step: AIC=273.09
dist ~ I(speed^2)
Df Sum of Sq RSS AIC
<none> 10871 273.09
- I(speed^2) 1 21668 32539 325.91
Call:
lm(formula = dist ~ I(speed^2), data = cars)
Coefficients:
(Intercept) I(speed^2)
8.860 0.129
the final AIC value is highlighted , only speed^2 remains in the final step
> step(fit,direction="forward")
Start: AIC=276.38
dist ~ speed + I(speed^2) + I(speed^3) + I(speed^4)
Call:
lm(formula = dist ~ speed + I(speed^2) + I(speed^3) + I(speed^4),
data = cars)
Coefficients:
(Intercept) speed I(speed^2) I(speed^3) I(speed^4)
45.845412 -18.962244 2.892190 -0.151951 0.002799
Hence the model results are different
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.