Problem # 4, exercise 3.10, page 123 (table B.11 Wine Quality Data). Please use
ID: 3043408 • Letter: P
Question
Problem # 4, exercise 3.10, page 123 (table B.11 Wine Quality Data). Please use R. All codes must be shown.
The quality of Pinot Noir wine is thought to be related to the properties of clarity, aroma, body, flavor, and oakiness. Data for 38 wines are given in table B.11
a.- Fit a multiple linear regression model relating wine quality to these regressors.
b.- Test for significance of regression. What conclusions can you draw.
c.- Use t tests to asses the contribution of each regressor to the model. Discuss your findings.
d.- Calculate R^2 and R^2 Adj for this model. Compare these values to the R^2 R^2 Adj for the linear regression model relating wine quality to aroma and flavor. Discuss your results.
e.- Find a 95% CI for the regression coefficient for flavor for both models in part d. Discuss any differences.
Table B.11. (Please use R, and show the codes).
Clarity, x1
.Aroma, x2.
Body, x3
.Flavor, x4
.Oakiness, x5.
Quality, y
.Region
1
3.3
2.8
3.1
4.1
9.8
1
1
4.4
4.9
3.5
3.9
12.6
1
1
3.9
5.3
4.8
4.7
11.9
1
1
3.9
2.6
3.1
3.6
11.1
1
1
5.6
5.1
5.5
5.1
13.3
1
1
4.6
4.7
5
4.1
12.8
1
1
4.8
4.8
4.8
3.3
12.8
1
1
5.3
4.5
4.3
5.2
12
1
1
4.3
4.3
3.9
2.9
13.6
3
1
4.3
3.9
4.7
3.9
13.9
1
1
5.1
4.3
4.5
3.6
14.4
3
0.5
3.3
5.4
4.3
3.6
12.3
2
0.8
5.9
5.7
7
4.1
16.1
3
0.7
7.7
6.6
6.7
3.7
16.1
3
1
7.1
4.4
5.8
4.1
15.5
3
0.9
5.5
5.6
5.6
4.4
15.5
3
1
6.3
5.4
4.8
4.6
13.8
3
1
5
5.5
5.5
4.1
13.8
3
1
4.6
4.1
4.3
3.1
11.3
1
0.9
3.4
5
3.4
3.4
7.9
2
0.9
6.4
5.4
6.6
4.8
15.1
3
1
5.5
5.3
5.3
3.8
13.5
3
0.7
4.7
4.1
5
3.7
10.8
2
0.7
4.1
4
4.1
4
9.5
2
1
6
5.4
5.7
4.7
12.7
3
1
4.3
4.6
4.7
4.9
11.6
2
1
3.9
4
5.1
5.1
11.7
1
1
5.1
4.9
5
5.1
11.9
2
1
3.9
4.4
5
4.4
10.8
2
1
4.5
3.7
2.9
3.9
8.5
2
1
5.2
4.3
5
6
10.7
2
0.8
4.2
3.8
3
4.7
9.1
1
1
3.3
3.5
4.3
4.5
12.1
1
1
6.8
5
6
5.2
14.9
3
0.8
5
5.7
5.5
4.8
13.5
1
0.8
3.5
4.7
4.2
3.3
12.2
1
0.8
4.3
5.5
3.5
5.8
10.3
1
0.8
5.2
4.8
5.7
3.5
13.2
1
Clarity, x1
.Aroma, x2.
Body, x3
.Flavor, x4
.Oakiness, x5.
Quality, y
.Region
1
3.3
2.8
3.1
4.1
9.8
1
1
4.4
4.9
3.5
3.9
12.6
1
1
3.9
5.3
4.8
4.7
11.9
1
1
3.9
2.6
3.1
3.6
11.1
1
1
5.6
5.1
5.5
5.1
13.3
1
1
4.6
4.7
5
4.1
12.8
1
1
4.8
4.8
4.8
3.3
12.8
1
1
5.3
4.5
4.3
5.2
12
1
1
4.3
4.3
3.9
2.9
13.6
3
1
4.3
3.9
4.7
3.9
13.9
1
1
5.1
4.3
4.5
3.6
14.4
3
0.5
3.3
5.4
4.3
3.6
12.3
2
0.8
5.9
5.7
7
4.1
16.1
3
0.7
7.7
6.6
6.7
3.7
16.1
3
1
7.1
4.4
5.8
4.1
15.5
3
0.9
5.5
5.6
5.6
4.4
15.5
3
1
6.3
5.4
4.8
4.6
13.8
3
1
5
5.5
5.5
4.1
13.8
3
1
4.6
4.1
4.3
3.1
11.3
1
0.9
3.4
5
3.4
3.4
7.9
2
0.9
6.4
5.4
6.6
4.8
15.1
3
1
5.5
5.3
5.3
3.8
13.5
3
0.7
4.7
4.1
5
3.7
10.8
2
0.7
4.1
4
4.1
4
9.5
2
1
6
5.4
5.7
4.7
12.7
3
1
4.3
4.6
4.7
4.9
11.6
2
1
3.9
4
5.1
5.1
11.7
1
1
5.1
4.9
5
5.1
11.9
2
1
3.9
4.4
5
4.4
10.8
2
1
4.5
3.7
2.9
3.9
8.5
2
1
5.2
4.3
5
6
10.7
2
0.8
4.2
3.8
3
4.7
9.1
1
1
3.3
3.5
4.3
4.5
12.1
1
1
6.8
5
6
5.2
14.9
3
0.8
5
5.7
5.5
4.8
13.5
1
0.8
3.5
4.7
4.2
3.3
12.2
1
0.8
4.3
5.5
3.5
5.8
10.3
1
0.8
5.2
4.8
5.7
3.5
13.2
1
Explanation / Answer
The r code is as follows
# read the data into R dataframe
data.df<- read.csv("C:\Users\586645\Downloads\Chegg\clarity.csv",header=TRUE)
str(data.df)
## drop the not needed variable
data.df <- data.df[,-7]
## fit the regression model
fit <- lm(Quality ~., data=data.df)
summary(fit)
The results are
> summary(fit)
Call:
lm(formula = Quality ~ ., data = data.df)
Residuals:
Min 1Q Median 3Q Max
-2.85552 -0.57448 -0.07092 0.67275 1.68093
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.9969 2.2318 1.791 0.082775 .
Clarity 2.3395 1.7348 1.349 0.186958
Aroma 0.4826 0.2724 1.771 0.086058 .
Body 0.2732 0.3326 0.821 0.417503
Flavor 1.1683 0.3045 3.837 0.000552 ***
Oakiness -0.6840 0.2712 -2.522 0.016833 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.163 on 32 degrees of freedom
Multiple R-squared: 0.7206, Adjusted R-squared: 0.6769
F-statistic: 16.51 on 5 and 32 DF, p-value: 4.703e-08
The regression equation is formed using the coefficients as
quality = 3.99 +2.33 clarity + 0.4826 aroma + 0.2732 body +1.16flabor -0.684Oakiness
The r 2 value is 0.7206 . this means the model can explain about 72.06% variation in the data
the adj r2 is 0.6769 , which is less than r2 value because adjusted r2 penalises the model for number of indepndent vairables used
over all the p value is 4.703e-08 , hence the model is statisitcally signficant
we re rerun the anlysis only for 2 variables
> fit <- lm(Quality ~ Aroma + Flavor, data=data.df)
> summary(fit)
Call:
lm(formula = Quality ~ Aroma + Flavor, data = data.df)
Residuals:
Min 1Q Median 3Q Max
-2.19048 -0.60300 -0.03203 0.66039 2.46287
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.3462 1.0091 4.307 0.000127 ***
Aroma 0.5180 0.2759 1.877 0.068849 .
Flavor 1.1702 0.2905 4.027 0.000288 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.229 on 35 degrees of freedom
Multiple R-squared: 0.6586, Adjusted R-squared: 0.639
F-statistic: 33.75 on 2 and 35 DF, p-value: 6.811e-09
The regression equation is formed using the coefficients as
quality = 4.34 + 0.5180 aroma +1.17 flavor
The r 2 value is 0.6586 . this means the model can explain about 65.86% variation in the data
the adj r2 is 0.639 , which is less than r2 value because adjusted r2 penalises the model for number of indepndent vairables used
over all the p value is p-value: 6.811e-09 , hence the model is statisitcally signficant
Please note that we can answer only 4 subparts of a question at a time , as per the answering guidelines
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.