Problem # 2: Residential sales that occurred during the year 2005 were available
ID: 3041627 • Letter: P
Question
Problem # 2: Residential sales that occurred during the year 2005 were available from a city in the Midwest. Data on 50 arms-length transactions include sales price (y in thousand), finished square feet (x1, in thousand), number of bedrooms (x2), lot size (x3, in thousand), year built (x4, consider 2005-50), distance from a popular highway (5, in mile). The city tax assessor was interested in predicting sales price based on the demographic variable information given above. The data have not produced here. However, the Splus output are provided below Coefficients Value Std. Error t value Pr() Intercept) 314.1311 143.53702.1885 0.0340 2.0496 0.0464 2.2070 0.0326 0.6112 0.5442 4 1.9159 2.4940 0.7682 0.4465 x5 -5.1714 2.0729 2.4948 0.0164 x1 0.0291 0.0142 x2 16.75047.5896 x3 1.9283 3.1550 Residual atandard error:89.77 on 44 degrees of freedon Multiple R-Squarod: 0.7266 F-statistic: 23.390 on 5 and 44 degrees of freedon, the p-value is 0.0000 Correlation of Coefficients Intercept) x2 x3 x1-0.6466 x2-0.5736 x3 -0.4838 x4 -0.1646 x5 -0.1623 0.0299 0.2660 0.3397 0.0746 0.1487 0.2737 -0.0625 0.0732 -0.0571-0.2684 (a) Determine the multiple regression equation for the residential sales data. (b) Suppose you do not know the p-values, find the best predictor for Y (c) Interpret the coefficient of determination, R2 (d) Estimate the sale price of a house whose finished square feet is 2500 (square feet), 4 bedrooms, 5500 sq feet lot size, built in 2000 and 20 miles far from high way (e) Estimate the sale price of a house whose finished square feet is 2500 (square feet), 4 bedrooms, 7500 sq feet lot size, built in 2000 and 20 miles far from high way. Compare results in (d) and (e). (f) At the 5% significance level, does it appear that any of the predictor variables can be removed from the full model as unnecessary? (g) Obtain and interpret 99% confidence intervals for the coefficient (h) Test the hypothesis that Hath + 9, versus H1.pe+ >9. Usea-0.02.Explanation / Answer
a)
The multiple regression equation is formed using the coefficients as
Y = 314.13+0.0291X1 +16.7504X2 +1.9283x3-1.9159x4-5.1714x5
b)
If we do not know the p values then we must check on each predictor and the corresponding variance explained by each predictor independently for depdnent variable Y. The variable that explains the highest variation can then be selected
c)
the r2 value is 0.7266. This means that the model is able to explain 72.66% variation in the values of Y dues to variations in the value of independent variable X1, X2 .... X5
d)
Simply use the regression equation and put the given values to arrive at the answer
Y = 314.13+0.0291*2500 +16.7504*4 +1.9283*550-1.9159*45-5.1714*20 = 1324.8
since 2005 = 50
2000 will be 45
Please note that we can answer only 4 subparts of a question at a time , as per the answering guidelines
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.