Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

(a) Create a vector y containing values 1 through 5 and 11 through 15. Try doing

ID: 3074363 • Letter: #

Question

(a) Create a vector y containing values 1 through 5 and 11 through 15. Try doing this without listing out all the values. (b) Create a vector r containing "Yes" as the first 5 elements and "No" as the second 5 elements. Do this using the rep) function (c) Regress y on with the Im function and print the summary of the fit. Is a significant? Why does this make sense based on how we generated the data? d) How does being in the "Yes" group affect the value of the response compared to being in the "No" group?

Explanation / Answer

R output for a, b, c, d :

> #(a) create a vector y containing values 1 through 5 and 11 through 15

> y=c(1:5,11:15)

> y

[1] 1 2 3 4 5 11 12 13 14 15

> #(b) create a vector x containing "yes" as the first 5 elements and "No" as the second 5 elements

> x=c(rep("Yes",5),rep("No",5))

> x

[1] "Yes" "Yes" "Yes" "Yes" "Yes" "No" "No" "No" "No" "No"

> #(c) regress y on x with the lm() function

> l=lm(y~x)

> l

Call:

lm(formula = y ~ x)

Coefficients:

(Intercept) xYes  

13 -10  

> #summary of the fit

> summary(l)

Call:

lm(formula = y ~ x)

Residuals:

Min 1Q Median 3Q Max

-2 -1 0 1 2

Coefficients:

Estimate Std. Error t value Pr(>|t|)   

(Intercept) 13.0000 0.7071 18.39 7.89e-08 ***

xYes -10.0000 1.0000 -10.00 8.49e-06 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.581 on 8 degrees of freedom

Multiple R-squared: 0.9259, Adjusted R-squared: 0.9167

F-statistic: 100 on 1 and 8 DF, p-value: 8.488e-06