3. Researchers collected a random sample of data (shown below) on infants’ birth
ID: 3049832 • Letter: 3
Question
3. Researchers collected a random sample of data (shown below) on infants’ birthweights (Y, in lbs), gestation period (X1, in weeks), and variable X2 (X2 = 1 if baby’s last name begins with A, =2 if last name begins with B,..., =26 if last name begins with Z).
a) Run a regression of Y on X1 and X2. Report all Y X1 X2
relevant results. You should, naturally, use SAS === == ==
7.5 38 4
b) Is the improvement due to the addition of X2 (to a 8.0 39 14
model already including X1) significant? Test at 6.75 36 11
the 0.05 significance level. 6.50 36 2
7.25 37 23
7.0 37 1
c) Comment on the test in (b) – is the result expected? 5.5 35 24
7.5 38 5
d) Calculate the square of the partial correlation 8.0 39 15
between Y and X1 given X2, and between 6.75 36 12
Y and X2 given X1. Do this in two ways:
1) algebraically using the SSE’s from the appropriate full and
reduced models; and 2) via SAS, using “ / pcorr1
pcorr2” after the model statement. What then are the
two actual estimated partial correlation coefficients? To what two
partial F tests do these correspond, respectively?
e) Compute the residuals from a regression of Y on X2, and from a regression of X1 on X2. (HINT: Use the “output” statement in SAS PROC REG to get SAS to output these residuals for you when you fit both models separately. Call the variable containing the residuals “residY” in the first output dataset, and call it “residX1” in the second).
Merge the two output datasets together and produce a plot of “residY” on the y-axis vs. “residX1” on the x-axis (do the plot using PROC PLOT). You may also want to note that if you run the full multiple regression in PROC REG with “/ partial” after the model statement, one of the partial plots SAS produces should be the same as yours. Now, perform the corresponding SLR of “residY” (as the outcome variable) vs. “residX1” (as the predictor variable).
Finally, what is the estimated slope and the value of R-squared for this SLR, and what regression coefficient (from part a) and squared partial correlation coefficient (from part d) in the full MLR model do they correspond to? Try to think about why it makes intuitive sense that these connections would occur!
Explanation / Answer
Y X1 X2 7.50 38.00 4.00 8.00 39.00 14.00 6.75 36.00 11.00 6.50 36.00 2.00 7.25 37.00 23.00 7.00 37.00 1.00 5.50 35.00 24.00 7.50 38.00 5.00 8.00 39.00 15.00 6.75 36.00 12.00 SUMMARY OUTPUT Regression Statistics Multiple R 0.959764439 R Square 0.921147778 Adjusted R Square 0.898618572 Standard Error 0.240419414 Observations 10 ANOVA df SS MS F Significance F Regression 2 4.72664 2.363319768 40.886828 0.000137673 Residual 7 0.40461 0.057801495 Total 9 5.13125 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept -12.28644365 2.223984 -5.52451993 0.0008833 -17.54533003 -7.02756 -17.5453 -7.02756 X1 0.523294631 0.059337 8.818979285 4.867E-05 0.382984155 0.663605 0.382984 0.663605 X2 -0.004755599 0.009918 -0.47948575 0.6462111 -0.028208235 0.018697 -0.02821 0.018697 Intercept and X1 are significant at 95% confidence level X2 is not significant at 95% confidence level X2 is not significant and should not be used. R square is 0.92 which is very good
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.