This question uses the cars data in the datasets package with distance as the re
ID: 3048568 • Letter: T
Question
This question uses the cars data in the datasets package with distance as the response and speed as the predictor.
(a) Plot distance against speed. Use lm() to get a linear fit to the data and add the fit on the plot.
(b) Use a “residuals vs fit” plot to check if there is any non-constant variance or non-linearity problem. State the main problem and explain why in one or two sentences.
(c) Use a normal Q-Q probability plot to check if the normality assumption is met. State the main problem and explain why in one or two sentences.
(d) Shapiro-Wilk test is a test of normality of a numeric variable. The null hypothesis for this test is that the variable is normally distributed. Use the R function shapiro.test() to test if the residuals of the linear fit in part (a) is normally distributed. State the p-value of this test and your conclusion given = 0.05. Does the result support your conclusion in part (c)? (Use the code ?shapiro.test or help(shapiro.test) to understand how to use this function.)
(e) Now use sqrt(dist) as the response and fit a linear model. Show the fit on the same plot.
Explanation / Answer
data<-cars
data
head(data)
dist<-data[,2]
dist
speed<-data[,1]
speed
###(a)
model<-lm(dist~speed,data=data)
model
###(b)
residual=resid(model)
residual
####Looking at residual vs fit plot we cccan say that data is not linear
###(c)
plot(model)
qqnorm(model)
qqplot(model)
###(d)
shapiro.test(residual)
###(e)
dist2<-sqrt(dist)
dist2
model2<-lm(dist2~speed,data=data)
model2
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.