Question
Use R to solve following questions and paste procedures and conclusions.
ISE 529 Engineering Data Analytics Homework 2
Due date September 15, 2017.
A real estate appraiser is interested in predicting residential home prices in a mid-western city as a
function of various features. For that purpose a regression model is to be constructed from a sample
of 522 houses. Download the homes.xls data set from blackboard.
Consider the predictors
x1: lot size (square feet), x2: area (square feet), x3: number of bedrooms,
x4: number of bathrooms, x5: year of construction, x6: garage size (number of cars).
8. Interpret the slope value b1.
Fit a full model for houses having between two to four bedrooms
9. Interpret adequacy values (MSE, R2).
10. Find a 95% prediction interval for the price of a house with a garage for two cars, area of 3150
square feet, built in 1996, 26250 square feet size, two bedrooms, three bathrooms.
price area beds baths garage year style lotsize ac pool quality highway 360000 3032 4 4 2 1972 1 22221 YES NO MEDIUM NO 340000 2058 4 2 2 1976 1 22912 YES NO MEDIUM NO 250000 1780 4 3 2 1980 1 21345 YES NO MEDIUM NO 205500 1638 4 2 2 1963 1 17342 YES NO MEDIUM NO 275500 2196 4 3 2 1968 7 21786 YES NO MEDIUM NO 248000 1966 4 3 5 1972 1 18902 YES YES MEDIUM NO 229900 2216 3 2 2 1972 7 18639 YES NO MEDIUM NO 150000 1597 2 1 1 1955 1 22112 YES NO MEDIUM NO 195000 1622 3 2 2 1975 1 14321 YES NO LOW NO 160000 1976 3 3 1 1918 1 32358 NO NO LOW NO 190000 2812 7 5 2 1966 7 56639 NO YES LOW NO 559000 2791 3 4 3 1992 1 30595 YES NO HIGH NO 535000 3381 5 4 3 1988 7 23172 YES NO HIGH NO 525000 3459 5 4 2 1978 5 35351 YES NO HIGH NO 299900 2090 3 3 2 1987 1 24025 YES NO MEDIUM NO 527000 3232 5 5 2 1984 6 21445 YES NO MEDIUM NO 169900 1502 2 2 2 1956 1 28958 YES NO MEDIUM NO 335250 2747 3 4 2 1993 7 22241 YES NO MEDIUM NO 323900 2890 4 3 2 1954 7 41992 YES NO MEDIUM NO 200000 1825 3 3 2 1957 1 30266 YES NO MEDIUM NO 211000 1578 4 3 2 1986 2 18829 YES NO MEDIUM NO 212000 1763 3 3 2 1959 1 24726 YES NO MEDIUM NO 245000 2517 4 3 2 1965 1 23261 YES NO MEDIUM NO 140400 1872 3 2 2 1985 3 24017 YES NO MEDIUM NO 295000 3266 3 3 2 1908 6 24881 YES NO MEDIUM NO 170900 2020 1 2 1 1956 1 21385 YES NO MEDIUM NO 229000 2164 4 2 2 1965 1 28291 YES NO MEDIUM NO 218500 2080 3 2 2 1959 1 14752 YES YES MEDIUM NO 160000 2208 2 2 2 1985 7 8058 YES NO MEDIUM NO 259000 3048 6 4 3 1960 7 29307 YES NO MEDIUM NO 164500 1460 3 2 2 1978 1 9999 YES NO MEDIUM NO 280000 2540 3 2 2 1940 5 42428 NO NO MEDIUM NO 154000 2208 2 2 2 1985 7 6746 YES NO MEDIUM NO 272000 2560 4 2 3 1977 5 36100 YES NO MEDIUM NO 180000 2061 4 2 2 1958 1 20138 NO NO MEDIUM NO 157500 1980 3 2 2 1957 1 32519 YES NO MEDIUM NO 242500 3308 5 4 2 1928 5 47323 YES NO MEDIUM NO 182000 2616 5 3 2 1955 5 11123 NO NO LOW NO 178000 1460 4 2 2 1961 1 27095 YES NO LOW NO 171900 1580 2 1 1 1951 4 12417 NO NO LOW NO 165500 1460 3 2 2 1960 1 22493 YES NO LOW NO 183500 1540 3 2 2 1992 3 15801 YES NO LOW NO 135000 1388 2 1 2 1951 1 26106 NO NO LOW NO 175000 1624 3 2 2 1948 1 39219 YES NO LOW NO 149500 1580 2 1 2 1966 1 11166 YES NO LOW NO 177500 1820 3 2 2 1960 1 22104 YES NO LOW NO 155000 1733 4 1 1 1936 4 22398 YES NO LOW NO 145000 1896 3 2 2 1925 6 32753 NO NO LOW NO 178000 2038 2 2 2 1918 7 47884 NO NO LOW NO 156000 1436 3 2 3 1920 1 43594 NO NO LOW NO 159000 1690 3 2 1 1922 5 28518 NO NO LOW NO 160000 1496 2 2 1 1900 5 43335 NO NO LOW NO 112000 1668 2 1 1 1948 1 19612 NO NO LOW NO 84000 980 1 1 1 1951 1 17686 NO NO LOW NO 155000 2562 3 2 2 1885 7 40800 NO NO LOW NO 360000 2304 5 4 3 1978 1 70240 YES NO MEDIUM NO 104000 1268 2 1 1 1947 1 21067 NO NO LOW NO 420000 2283 3 3 3 1997 1 18524 YES NO HIGH YES 355000 2060 2 3 2 1997 1 38623 YES NO MEDIUM YES 165000 2087 2 2 2 1966 1 24764 YES NO MEDIUM YES 244000 2081 4 2 2 1980 3 24993 YES NO MEDIUM YES 179900 1696 3 3 2 1978 2 22294 YES NO MEDIUM YES 253000 2222 4 2 2 1955 1 71527 NO NO MEDIUM YES 200000 2110 5 3 2 1957 1 15332 YES NO MEDIUM YES 200000 1774 4 2 2 1963 1 15528 NO NO LOW YES 147700 1592 3 2 2 1957 1 11221 YES NO LOW YES 188700 1748 3 2 2 1972 1 23939 YES NO LOW YES 177000 1985 3 1 2 1948 1 69975 NO NO LOW YES 585000 2558 2 4 3 1984 3 24601 YES YES HIGH NO 549900 4000 6 5 3 1979 10 23595 YES YES HIGH NO 675000 3942 4 3 2 1990 7 18920 YES NO HIGH NO 830000 3889 4 4 3 1991 7 28378 YES NO HIGH NO 920000 3857 4 5 3 1997 1 32793 YES NO HIGH NO 855000 4756 4 4 3 1990 7 22215 YES NO HIGH NO 585500 3302 4 3 3 1982 7 26463 YES NO HIGH NO 399000 2629 3 3 2 1989 9 24778 YES NO HIGH NO 790000 4418 5 5 3 1997 7 22024 YES NO HIGH NO 665000 4746 4 4 3 1996 7 23368 YES NO HIGH NO 725000 3242 3 3 3 1989 1 27173 YES NO HIGH NO 647000 2464 3 3 3 1992 1 31703 YES NO HIGH NO 780000 4419 4 5 7 1987 1 56127 YES NO HIGH NO 657500 3877 3 3 3 1992 7 24639 YES NO HIGH NO 578000 3808 5 4 3 1982 7 23324 YES NO HIGH NO 500000 3376 5 4 2 1947 7 18452 YES NO HIGH NO 484530 2940 3 3 3 1979 7 20639 YES YES HIGH NO 609000 2654 5 3 3 1997 1 12821 YES NO HIGH NO 635000 2672 4 3 3 1995 1 28049 YES NO HIGH NO 519000 3386 4 4 3 1994 7 24008 YES NO HIGH NO 625100 3648 4 4 3 1992 7 26604 YES NO HIGH NO 585444 3455 4 5 3 1995 7 22468 YES NO HIGH NO 399900 3321 3 4 3 1971 7 15012 YES YES HIGH NO 389900 2817 4 3 3 1996 7 31214 YES NO HIGH NO 649000 3210 3 5 3 1995 1 30033 YES NO HIGH NO 535000 3588 4 5 3 1987 7 22530 YES NO HIGH NO 640000 2705 3 3 3 1994 1 22196 YES NO HIGH NO 600000 2344 4 3 2 1925 1 86004 YES NO HIGH NO 582500 4264 5 4 4 1995 7 24983 YES NO HIGH NO 545000 2615 3 3 3 1996 1 21722 YES NO HIGH NO
Explanation / Answer
(8)we fit the regression equation
y=b0+b1x1+b2x2+b3x3+b4x4+b5x5+b6x6 and generate following information
y=-4880157+2.9339x1+151.8846x2-12863x3+33692x4+2392.0817x5+8602.2592x5
slope b1 =2.93393 is per unit change in price y if other variable x2,x3,x4,x5,x6 are fixed
(9) here MSE=89521 and R2=0.8358
(10) garage for two cars(x6=2), area of 3150(x2)
square feet, built in 1996(x5), 26250(x1) square feet size, two bedrooms(x3), three bathrooms.(x4) the predicted value=542444
y=-4880157+2.9339*26250+151.8846*3150-12863*2+33692*3+2392.0817*1996+8602.2592*2=542444
here
Analysis of Variance Source DF Sum of Mean F Value P-value Squares Square Model 6 3.10E+12 5.17E+11 64.46 <.0001 Error 76 6.09E+11 8014044112 Corrected Total 82 3.71E+12 Root MSE 89521 R-Square 0.8358 Dependent Mean 330093 Adj R-Sq 0.8228 Coeff Var 27.12 Parameter Estimates Variable DF Parameter Standard t Value P-value Estimate Error Intercept 1 -4880157 1073571 -4.55 <.0001 x1 1 2.93393 0.90266 3.25 0.0017 x2 1 151.88459 19.69699 7.71 <.0001 x3 1 -12863 14682 -0.88 0.3837 x4 1 33692 16702 2.02 0.0472 x5 1 2392.0817 552.47154 4.33 <.0001 x6 1 8602.2592 16239 0.53 0.5979