Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

This question uses the cars data in the datasets package with distance as the re

ID: 3048568 • Letter: T

Question

This question uses the cars data in the datasets package with distance as the response and speed as the predictor.

(a) Plot distance against speed. Use lm() to get a linear fit to the data and add the fit on the plot.

(b) Use a “residuals vs fit” plot to check if there is any non-constant variance or non-linearity problem. State the main problem and explain why in one or two sentences.

(c) Use a normal Q-Q probability plot to check if the normality assumption is met. State the main problem and explain why in one or two sentences.

(d) Shapiro-Wilk test is a test of normality of a numeric variable. The null hypothesis for this test is that the variable is normally distributed. Use the R function shapiro.test() to test if the residuals of the linear fit in part (a) is normally distributed. State the p-value of this test and your conclusion given = 0.05. Does the result support your conclusion in part (c)? (Use the code ?shapiro.test or help(shapiro.test) to understand how to use this function.)

(e) Now use sqrt(dist) as the response and fit a linear model. Show the fit on the same plot.

Explanation / Answer

data<-cars
data
head(data)

dist<-data[,2]
dist
speed<-data[,1]
speed
###(a)
model<-lm(dist~speed,data=data)
model
###(b)
residual=resid(model)
residual

####Looking at residual vs fit plot we cccan say that data is not linear
###(c)
plot(model)
qqnorm(model)
qqplot(model)
###(d)
shapiro.test(residual)
###(e)
dist2<-sqrt(dist)
dist2
model2<-lm(dist2~speed,data=data)
model2

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote