Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Problem # 4, exercise 3.10, page 123 (table B.11 Wine Quality Data). Please use

ID: 3043408 • Letter: P

Question

Problem # 4, exercise 3.10, page 123 (table B.11 Wine Quality Data). Please use R. All codes must be shown.

The quality of Pinot Noir wine is thought to be related to the properties of clarity, aroma, body, flavor, and oakiness. Data for 38 wines are given in table B.11

a.- Fit a multiple linear regression model relating wine quality to these regressors.

b.- Test for significance of regression. What conclusions can you draw.

c.- Use t tests to asses the contribution of each regressor to the model. Discuss your findings.

d.- Calculate R^2 and R^2 Adj for this model. Compare these values to the R^2 R^2 Adj for the linear regression model relating wine quality to aroma and flavor. Discuss your results.

e.- Find a 95% CI for the regression coefficient for flavor for both models in part d. Discuss any differences.

Table B.11. (Please use R, and show the codes).

Clarity, x1

.Aroma, x2.

Body, x3

.Flavor, x4

.Oakiness, x5.

Quality, y

.Region

1

3.3

2.8

3.1

4.1

9.8

1

1

4.4

4.9

3.5

3.9

12.6

1

1

3.9

5.3

4.8

4.7

11.9

1

1

3.9

2.6

3.1

3.6

11.1

1

1

5.6

5.1

5.5

5.1

13.3

1

1

4.6

4.7

5

4.1

12.8

1

1

4.8

4.8

4.8

3.3

12.8

1

1

5.3

4.5

4.3

5.2

12

1

1

4.3

4.3

3.9

2.9

13.6

3

1

4.3

3.9

4.7

3.9

13.9

1

1

5.1

4.3

4.5

3.6

14.4

3

0.5

3.3

5.4

4.3

3.6

12.3

2

0.8

5.9

5.7

7

4.1

16.1

3

0.7

7.7

6.6

6.7

3.7

16.1

3

1

7.1

4.4

5.8

4.1

15.5

3

0.9

5.5

5.6

5.6

4.4

15.5

3

1

6.3

5.4

4.8

4.6

13.8

3

1

5

5.5

5.5

4.1

13.8

3

1

4.6

4.1

4.3

3.1

11.3

1

0.9

3.4

5

3.4

3.4

7.9

2

0.9

6.4

5.4

6.6

4.8

15.1

3

1

5.5

5.3

5.3

3.8

13.5

3

0.7

4.7

4.1

5

3.7

10.8

2

0.7

4.1

4

4.1

4

9.5

2

1

6

5.4

5.7

4.7

12.7

3

1

4.3

4.6

4.7

4.9

11.6

2

1

3.9

4

5.1

5.1

11.7

1

1

5.1

4.9

5

5.1

11.9

2

1

3.9

4.4

5

4.4

10.8

2

1

4.5

3.7

2.9

3.9

8.5

2

1

5.2

4.3

5

6

10.7

2

0.8

4.2

3.8

3

4.7

9.1

1

1

3.3

3.5

4.3

4.5

12.1

1

1

6.8

5

6

5.2

14.9

3

0.8

5

5.7

5.5

4.8

13.5

1

0.8

3.5

4.7

4.2

3.3

12.2

1

0.8

4.3

5.5

3.5

5.8

10.3

1

0.8

5.2

4.8

5.7

3.5

13.2

1

Clarity, x1

.Aroma, x2.

Body, x3

.Flavor, x4

.Oakiness, x5.

Quality, y

.Region

1

3.3

2.8

3.1

4.1

9.8

1

1

4.4

4.9

3.5

3.9

12.6

1

1

3.9

5.3

4.8

4.7

11.9

1

1

3.9

2.6

3.1

3.6

11.1

1

1

5.6

5.1

5.5

5.1

13.3

1

1

4.6

4.7

5

4.1

12.8

1

1

4.8

4.8

4.8

3.3

12.8

1

1

5.3

4.5

4.3

5.2

12

1

1

4.3

4.3

3.9

2.9

13.6

3

1

4.3

3.9

4.7

3.9

13.9

1

1

5.1

4.3

4.5

3.6

14.4

3

0.5

3.3

5.4

4.3

3.6

12.3

2

0.8

5.9

5.7

7

4.1

16.1

3

0.7

7.7

6.6

6.7

3.7

16.1

3

1

7.1

4.4

5.8

4.1

15.5

3

0.9

5.5

5.6

5.6

4.4

15.5

3

1

6.3

5.4

4.8

4.6

13.8

3

1

5

5.5

5.5

4.1

13.8

3

1

4.6

4.1

4.3

3.1

11.3

1

0.9

3.4

5

3.4

3.4

7.9

2

0.9

6.4

5.4

6.6

4.8

15.1

3

1

5.5

5.3

5.3

3.8

13.5

3

0.7

4.7

4.1

5

3.7

10.8

2

0.7

4.1

4

4.1

4

9.5

2

1

6

5.4

5.7

4.7

12.7

3

1

4.3

4.6

4.7

4.9

11.6

2

1

3.9

4

5.1

5.1

11.7

1

1

5.1

4.9

5

5.1

11.9

2

1

3.9

4.4

5

4.4

10.8

2

1

4.5

3.7

2.9

3.9

8.5

2

1

5.2

4.3

5

6

10.7

2

0.8

4.2

3.8

3

4.7

9.1

1

1

3.3

3.5

4.3

4.5

12.1

1

1

6.8

5

6

5.2

14.9

3

0.8

5

5.7

5.5

4.8

13.5

1

0.8

3.5

4.7

4.2

3.3

12.2

1

0.8

4.3

5.5

3.5

5.8

10.3

1

0.8

5.2

4.8

5.7

3.5

13.2

1

Explanation / Answer

The r code is as follows

# read the data into R dataframe
data.df<- read.csv("C:\Users\586645\Downloads\Chegg\clarity.csv",header=TRUE)
str(data.df)

## drop the not needed variable

data.df <- data.df[,-7]

## fit the regression model

fit <- lm(Quality ~., data=data.df)
summary(fit)

The results are

> summary(fit)

Call:

lm(formula = Quality ~ ., data = data.df)

Residuals:

Min 1Q Median 3Q Max

-2.85552 -0.57448 -0.07092 0.67275 1.68093

Coefficients:

Estimate Std. Error t value Pr(>|t|)   

(Intercept) 3.9969 2.2318 1.791 0.082775 .  

Clarity 2.3395 1.7348 1.349 0.186958   

Aroma 0.4826 0.2724 1.771 0.086058 .  

Body 0.2732 0.3326 0.821 0.417503   

Flavor 1.1683 0.3045 3.837 0.000552 ***

Oakiness -0.6840 0.2712 -2.522 0.016833 *  

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.163 on 32 degrees of freedom

Multiple R-squared: 0.7206, Adjusted R-squared: 0.6769

F-statistic: 16.51 on 5 and 32 DF, p-value: 4.703e-08

The regression equation is formed using the coefficients as

quality = 3.99 +2.33 clarity + 0.4826 aroma + 0.2732 body +1.16flabor -0.684Oakiness

The r 2 value is 0.7206 . this means the model can explain about 72.06% variation in the data

the adj r2 is 0.6769 , which is less than r2 value because adjusted r2 penalises the model for number of indepndent vairables used

over all the p value is 4.703e-08 , hence the model is statisitcally signficant

we re rerun the anlysis only for 2 variables

> fit <- lm(Quality ~ Aroma + Flavor, data=data.df)

> summary(fit)

Call:

lm(formula = Quality ~ Aroma + Flavor, data = data.df)

Residuals:

Min 1Q Median 3Q Max

-2.19048 -0.60300 -0.03203 0.66039 2.46287

Coefficients:

Estimate Std. Error t value Pr(>|t|)   

(Intercept) 4.3462 1.0091 4.307 0.000127 ***

Aroma 0.5180 0.2759 1.877 0.068849 .  

Flavor 1.1702 0.2905 4.027 0.000288 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.229 on 35 degrees of freedom

Multiple R-squared: 0.6586, Adjusted R-squared: 0.639

F-statistic: 33.75 on 2 and 35 DF, p-value: 6.811e-09

The regression equation is formed using the coefficients as

quality = 4.34 + 0.5180 aroma +1.17 flavor

The r 2 value is 0.6586 . this means the model can explain about 65.86% variation in the data

the adj r2 is 0.639 , which is less than r2 value because adjusted r2 penalises the model for number of indepndent vairables used

over all the p value is p-value: 6.811e-09 , hence the model is statisitcally signficant

Please note that we can answer only 4 subparts of a question at a time , as per the answering guidelines

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote