Data description We consider a dataset relating gas mileage, horsepower and othe
ID: 3366174 • Letter: D
Question
Data description
We consider a dataset relating gas mileage, horsepower and other information for 324 different cars. More precisely, for each car, the following variables are reported:
• mpg - Miles per gallon;?
• cylinders - Number of cylinders between 4 and 8;?
• horsepower - Engine horsepower;?
• weight - Vehicle weight (lbs.);
• acceleration - Time to accelerate from 0 to 60 mph (sec.);
• year - Model year (modulo 100);?• origin - Origin of car (0. American, 1. Japanese);
In our analysis, we are interested in the variables affecting mpg through the linear regression approach. 1
1. We have reported in Figure 1 the scatter plots of mpg versus cylinders, horsepower, weight, acceleration, and year. Briefly discuss the relationship between mpg and each regressor ac- cording to the scatter plot, and comment on what result you expect to get by running a linear regression.
Figure 1: Scatterplots of mpg versus, from left to right: cylinders, horsepower, weight, acceler- ation, year 1500 2500 3500 4500 0 72 74 76 78 80 82 Table 1: Simple linear regression: mpg vs horsepower Coeff Std. Error t value 95% confit intercept39.31 0.800 horsepower-0.15 0.007 49.12 37.74, 40.88] 22.24 ?? 60.5% 60.4% adj-R2 Table 2: Multiple linear regression: mpg vs horsepower and horsepower2 Coeff Std. Error t value 95% confit 58.78 2.04 intercept horsepower !-0.50 0.03 horsepower 0.00 0.00013 10.18 0.00107, 0.00159) 28.88 54.77, 62.79 14.49 -0.56,-0.43 70.2% 70.0% adj-R2Explanation / Answer
From scatter plot of mpg vs. cylinders, it is observed that there is quadratic (2nd degree polynomial) relationship between mpg vs. cylinders.
From scatter plot of mpg (y) vs. horsepower (x), relationship is: y=x-2
From scatter plot of mpg (y) vs. weight (x), relationship is: y=x-1
From scatter plot of mpg (y) vs. acceleration (x), we observe that the correlation between x and y is zero.
From scatter plot of mpg (y) vs. year (x), relationship is: y=a+bx i.e. linear.
From Table 1 we observe that 60.5% proportion of total variation in the data is explained by the linear regression of mpg on horsepower whereas 70.2% proportion of total variation in the data is explained by the 2nd degree polynomial regression of mpg on horsepower (see Table 2). Hence we suggest it is better to use 2nd degree polynomial regression than linear regression.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.