Load the prostate data from the Faraway package into R. Let lpsa be the outcome
ID: 3171245 • Letter: L
Question
Load the prostate data from the Faraway package into R. Let lpsa be the outcome and treat all of the other variables as predictors. Set aside the first observation/subject from the dataset.
(a) Fit the model using all of the data except the first observation. Summarize the fit of the model, describing the relationship between the various predictors and lpsa.
(b) Remove any predictors that are not significant above at = 0.1. Summarize the fit of the new/sub model and contrast with the full model.
(c) Form 95% prediction intervals for the first observation (that was left out). Compare and contrast the predictions and intervals. Which model (full or sub) seems to predict better in this case?
(d) Based on your results above, do you think there is sufficient evidence to prefer one model over the other? Why or why not?
Explanation / Answer
# all R commands
library(faraway)# library which contains the dataset.
class(prostate)#gives the type/class of the data
data=prostate
d=scan("clipboard")
data=matrix(d,nrow=97,byrow=T)
#a)
y=data[,9]
x1=data[,2]
x2=data[,3]
x3=data[,4]
x4=data[,5]
x5=data[,6]
x6=data[,7]
fit=lm(y~x1+x2+x3+x4+x5+x6)
pairs(y~x1+x2+x3+x4+x5+x6)
d=cbind(y,x1,x2,x3,x4,x5,x6)
cor(d)
m=model.matrix(~x1+x2+x3+x4+x5+x6)
beta=solve(t(m)%*%m)%*%t(m)%*%y
beta
b0=beta[1]
b0
names(summary(fit))
betaest=summary(fit)$coefficients
betaest[,1]
t(m)%*%m%*%beta
t(m)%*%y
H=(m)%*%solve(t(m)%*%m)%*%t(m)
res=y-(H%*%y)
var(res)
resest=summary(fit)$residuals
errsig=summary(fit)$sigma
errvar=(errsig)^2
errvar
round(vcov(fit),4)
ve=diag(vcov(fit))
se=sqrt(ve)
se
par(mfrow=c(2,2))
plot(fit)
shapiro.test(y)
shapiro.test(resest)
f=summary(fit)
f
ftb=qf(0.95,6,90)# f test is for check adequacy of the model
ftb
# now to test the significance of each predictor in the model.it follows dist
b0=beta[1]
b1=beta[2]
b2=beta[3]
b3=beta[4]
b4=beta[5]
b5=beta[6]
b6=beta[7]
seb0=se[1]
seb1=se[2]
seb2=se[3]
seb3=se[4]
seb4=se[5]
seb5=se[6]
seb6=se[7]
t=c()
for(i in 0:7)
{
t[i]=(beta[i]/se[i])
}
t
tb=abs(qt(0.025,95))
tb
# or using summary(fit) we can get t calculated values
ci=confint(fit)
ci
#b) # new model , Remove any predictors
y=data[,9]#response variable lpsa
y
x1=data[,1]#regressor/predictor lcavol
x1 fit=lm(y~x1)
names(fit)
plot(y~x1,xlab="lcavol",ylab="lpsa")
abline(fit)
summary(fit)
coef=fit$coefficient
bo=coef[1]
b1=coef[2]
bo
b1
m=model.matrix(~x1)
m
xtxi=solve(t(m)%*%m)
betaest=xtxi%*%t(m)%*%y
betaest
b0=betaest[1]
b1=betaest[2]
b0
b1
solve(crossprod(m,m),crossprod(m,y))
fitobject=summary(fit)
fitobject
names(fitobject)
rsq=fitobject$r.squared
rsq
par=(mfrow=c(2,2))
par
plot(fit)
shapiro.test(y)
res=fit$residual
shapiro.test(res)
fitobject=summary(fit)
fitobject$coefficient
ci=confint(fit)
ci
ciinterpret=ci[1,]
ciinterpret
cislpoe=ci[2,]
cislpoe
#c) Form 95% prediction intervals for the first observation (that was left out). Compare and contrast the predictions and intervals. All “c” beat calculate option (a)and (b).all full model is better then sub model.
#d) I see that ,full model is better than sub full model. Because ,In full model is less error compare than sub full model . you can run all command and see that.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.