Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Use the computer to simulate 100 data points from a normal distribution with mea

ID: 3065229 • Letter: U

Question

Use the computer to simulate 100 data points from a normal distribution with mean 0 and variance 1. Store the results in a column called Y.Repeat this process 10 more times, storing results in X1;X2.... .X10. Notice that the Y should be totally unrelated to the explanatory variables. (a) Fit the regression of Y on all 10 explanatory variables. What is R2? (b) What modell is suggested by forward selection? (c) Which model has the smallest Cp statistic? (d) Which model has the smallest BIC? (e) What danger (if any) is there in using a variable selection technique when the number of explanatory variables is a substantial proportion of the sample size?

Explanation / Answer

Ans : Here, i gives the answer of these questions by using R- software

As given information we generate the data set as,

Y=rnorm(100,0,1);Y
X <- matrix(rnorm(1000), 100, 10)
X1=as.data.frame(X);X1
data=data.frame(X,Y);data

(a) Fitting regression model on the data

model=lm(Y~.,data=X1)
summary(model)

we gives the following summary

Call:

lm(formula = Y ~ ., data = X1)

Residuals:

Min 1Q Median 3Q Max

-3.1810 -0.7501 0.0168 0.7335 2.2578

Coefficients:

Estimate Std. Error t value Pr(>|t|)  

(Intercept) -0.0009599 0.1160404 -0.008 0.9934  

V1 0.0938381 0.1201722 0.781 0.4370  

V2 0.0931644 0.1230150 0.757 0.4508  

V3 0.0436138 0.1181784 0.369 0.7130  

V4 0.0203657 0.1175300 0.173 0.8628  

V5 -0.0606241 0.1202659 -0.504 0.6154  

V6 0.1439136 0.1139495 1.263 0.2099  

V7 -0.1617294 0.1138754 -1.420 0.1590  

V8 -0.0749428 0.1205066 -0.622 0.5356  

V9 0.0488038 0.1176421 0.415 0.6793  

V10 -0.2142865 0.1275966 -1.679 0.0966 .

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.117 on 89 degrees of freedom

Multiple R-squared: 0.07597, Adjusted R-squared: -0.02785

F-statistic: 0.7318 on 10 and 89 DF, p-value: 0.6927

R2 value = 0.07597

(b) Model selection

require(leaps)
library("leaps")
b=regsubsets(Y~., data=X1,method="forward");b
rs=summary(b);rs;rs$which;rs$cp;rs$adjr2;rs$bic;rs$rsq;

We gives

Forward selectio algorithm gives the result

Selection Algorithm: forward
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 ( 1 ) " " " " " " " " " " " " " " " " " " "*"
2 ( 1 ) " " " " " " " " " " " " "*" " " " " "*"
3 ( 1 ) " " " " " " " " " " "*" "*" " " " " "*"
4 ( 1 ) "*" " " " " " " " " "*" "*" " " " " "*"
5 ( 1 ) "*" "*" " " " " " " "*" "*" " " " " "*"
6 ( 1 ) "*" "*" " " " " " " "*" "*" "*" " " "*"
7 ( 1 ) "*" "*" " " " " "*" "*" "*" "*" " " "*"
8 ( 1 ) "*" "*" " " " " "*" "*" "*" "*" "*" "*"

Means select [1,2,5,6,7,8,9,10] variables

(c) Smallest Cp

Smallest Cp is -1.5096560 which is for the model which contain only intercept and X10 variable

(d) Smallest BIC

Smallest BIC is 7.295025 which is for the model which contain only intercept and X10 variable

(e) In given problem there are all explonatory variables are correlated so there is problem occures for variable selections.