Use the computer to simulate 100 data points from a normal distribution with mea
ID: 3065229 • Letter: U
Question
Use the computer to simulate 100 data points from a normal distribution with mean 0 and variance 1. Store the results in a column called Y.Repeat this process 10 more times, storing results in X1;X2.... .X10. Notice that the Y should be totally unrelated to the explanatory variables. (a) Fit the regression of Y on all 10 explanatory variables. What is R2? (b) What modell is suggested by forward selection? (c) Which model has the smallest Cp statistic? (d) Which model has the smallest BIC? (e) What danger (if any) is there in using a variable selection technique when the number of explanatory variables is a substantial proportion of the sample size?Explanation / Answer
Ans : Here, i gives the answer of these questions by using R- software
As given information we generate the data set as,
Y=rnorm(100,0,1);Y
X <- matrix(rnorm(1000), 100, 10)
X1=as.data.frame(X);X1
data=data.frame(X,Y);data
(a) Fitting regression model on the data
model=lm(Y~.,data=X1)
summary(model)
we gives the following summary
Call:
lm(formula = Y ~ ., data = X1)
Residuals:
Min 1Q Median 3Q Max
-3.1810 -0.7501 0.0168 0.7335 2.2578
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0009599 0.1160404 -0.008 0.9934
V1 0.0938381 0.1201722 0.781 0.4370
V2 0.0931644 0.1230150 0.757 0.4508
V3 0.0436138 0.1181784 0.369 0.7130
V4 0.0203657 0.1175300 0.173 0.8628
V5 -0.0606241 0.1202659 -0.504 0.6154
V6 0.1439136 0.1139495 1.263 0.2099
V7 -0.1617294 0.1138754 -1.420 0.1590
V8 -0.0749428 0.1205066 -0.622 0.5356
V9 0.0488038 0.1176421 0.415 0.6793
V10 -0.2142865 0.1275966 -1.679 0.0966 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.117 on 89 degrees of freedom
Multiple R-squared: 0.07597, Adjusted R-squared: -0.02785
F-statistic: 0.7318 on 10 and 89 DF, p-value: 0.6927
R2 value = 0.07597
(b) Model selection
require(leaps)
library("leaps")
b=regsubsets(Y~., data=X1,method="forward");b
rs=summary(b);rs;rs$which;rs$cp;rs$adjr2;rs$bic;rs$rsq;
We gives
Forward selectio algorithm gives the result
Selection Algorithm: forward
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 ( 1 ) " " " " " " " " " " " " " " " " " " "*"
2 ( 1 ) " " " " " " " " " " " " "*" " " " " "*"
3 ( 1 ) " " " " " " " " " " "*" "*" " " " " "*"
4 ( 1 ) "*" " " " " " " " " "*" "*" " " " " "*"
5 ( 1 ) "*" "*" " " " " " " "*" "*" " " " " "*"
6 ( 1 ) "*" "*" " " " " " " "*" "*" "*" " " "*"
7 ( 1 ) "*" "*" " " " " "*" "*" "*" "*" " " "*"
8 ( 1 ) "*" "*" " " " " "*" "*" "*" "*" "*" "*"
Means select [1,2,5,6,7,8,9,10] variables
(c) Smallest Cp
Smallest Cp is -1.5096560 which is for the model which contain only intercept and X10 variable
(d) Smallest BIC
Smallest BIC is 7.295025 which is for the model which contain only intercept and X10 variable
(e) In given problem there are all explonatory variables are correlated so there is problem occures for variable selections.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.