9. Calculate R 2 from the definition, using the sums of squares, and interpret i
ID: 3227052 • Letter: 9
Question
9. Calculate R 2 from the definition, using the sums of squares, and interpret it.Show that the R 2 value is equal to the square of the Pearson correlation coefficient.
DATA FILE HERE: https://expirebox.com/download/12b419270d0365657d8559cc5c29c7dc.html
MORE INFO:
Fat deposits in the trunk of the body may be more closely linked with bad health outcomes than fat in general. It can be hard to measure deep abdominal adipose tissue directly.
Despres et al. proposed that waist circumference might be a good predictor variable for deep abdominal adipose tissue. (We’ll see, over the course of our work with these data, that there is an association between these variables, but that we cannot predict precisely enough to use waist circumference as a good measure of an individual’s deep abdominal fat.)
In this dataset, the individuals received CT scans and the area of deep abdominal adipose tissue was measured. Even this isn’t perfect, as it’s an area, the units are cm 2 , and not a volume!
The outcome variable is the area of deep abdominal adipose tissue from the CT scan, the explanatory variable is the individual’s waist circumference in cm.
DATA:
summ waist_circ deep_ab_adipose
Variable | Obs Mean Std. De
> v. Min Max
-------------+----------------------------------
> -----------------------
waist_circ | 109 91.90184 13.5591
> 2 63.5 121
deep_ab_ad~e | 109 101.894 57.2947
> 6 11.44 253
. corr deep_ab_adipose waist_circ
(obs=109)
| deep_a~e waist_~c
-------------+------------------
deep_ab_ad~e | 1.0000
waist_circ | 0.8186 1.0000
. regr deep_ab_adipose waist_circ
Source | SS df MS
> Number of obs = 109
-------------+----------------------------------
> F(1, 107) = 217.28
Model | 237548.516 1 237548.516
> Prob > F = 0.0000
Residual | 116981.988 107 1093.28961
> R-squared = 0.6700
-------------+----------------------------------
> Adj R-squared = 0.6670
Total | 354530.504 108 3282.68985
> Root MSE = 33.065
------------------------------------------------
> ------------------------------
deep_ab_ad~e | Coef. Std. Err. t
> P>|t|
> [95% Con
> f. Interval]
-------------+----------------------------------
> ------------------------------
waist_circ | 3.458859 .2346521 14.74
> 0.000
> 2.993689
> 3.92403
_cons | -215.9815 21.79627 -9.91
> 0.000
> -259.1901
> -172.7729
------------------------------------------------
> ------------------------------
twoway (scatter deep_ab_adipose waist_circ) (line ab_adi_fit waist_circ), title(abdominal adipose tissue and waist circumference) subtitle(n = 109)
graph box ab_adi_res, title(standardized residuals)
subtitle(regression of abdominal adipose tissue and waist circumference)
summ waist_circ, detail
waist circumference in cm
------------------------------------------------
> -------------
Percentiles Smallest
1% 68.85 63.5
5% 73.1 68.85
10% 74.75 71.85 Obs
> 109
25% 80 71.9 Sum of Wgt
> . 109
50% 90.8 Mean
> 91.90184
Largest Std. Dev.
> 13.55912
75% 104 115
90% 109 119.6 Variance
> 183.8496
95% 111 119.9 Skewness
> .1322041
99% 119.9 121 Kurtosis
> 1.892724
.
summ deep_ab_adipose, detail
deep abdominal adipose tissue in cm sq fr
> om CT
scan
------------------------------------------------
> -------------
Percentiles Smallest
1% 21.68 11.44
5% 28.32 21.68
10% 32.22 25.72 Obs
> 109
25% 50.88 25.89 Sum of Wgt
> . 109
50% 96.54 Mean
> 101.894
Largest Std. Dev.
> 57.29476
75% 137 229
90% 184 241 Variance
> 3282.69
95% 208 245 Skewness
> .5767897
99% 245 253 Kurtosis
> 2.672811
.
qnorm ab_adi_res,
title(normal quantile plot for standardized residuals)
subtitle(abdominal adipose tissue and waist circumference n = 109)
swilk ab_adi_res
Shapiro-Wilk W test for norma
> l data
Variable | Obs W V
> z Prob>z
-------------+----------------------------------
> --------------------
ab_adi_res | 109 0.96492 3.113
> 2.531 0.00568
list waist_circ deep_ab_adipose ab_adi_fit se_
> ab_adi_ind if waist_circ > 120
+--------------------------------+
89. | waist_~c | deep_a~e | ab_adi~t |
| 121 | 245 | 202.5405 |
|--------------------------------|
| se_ab_~d |
| 33.91077 |
+--------------------------------+
Explanation / Answer
9) R2 is defined as R2 = (SSR/SST)
SSR = Sum of squares for regression ; SST = Sum of squares for total
From the output SSR = 237548.516; SST = SSR+SSE = 237548.516+116981.988 = 354530.504
R2 = (237548.516/354530.504) = 0.6700 = 67%
INTERPRETATION: 67% OF VARIATION IN THE DEPENDENT VARIABLE IS EXPLINED BY THE INDEPENDENT VARIABLE
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.