Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Galton’s Height Data Motivated by the work of his cousin, Charles Darwin, the En

ID: 3309607 • Letter: G

Question

Galton’s Height Data
Motivated by the work of his cousin, Charles Darwin, the English scientist Francis Galton studied the degree to which human traits were passed from one generation to the next. In an 1885 study, he measured the heights of 933 adult children and their parents. The data set which Galton created included some sets of siblings. Although one of the assumptions of linear regression is that the observations should be independent, we will ignore the possible dependence of observations from children of the same family when performing our analysis. The data set can be found at the bottom of the question.

a. Convert the categorical variable gender to an indicator variable which takes on the value 0 if the gender is female and the value 1 if the gender is male.

b. Construct a linear regression model that we can use to estimate a child’s height from their mother’s height, their father’s height and their gender. Report the fitted model.

c. By how much does a male’s mean height exceed a female’s mean height for children’s whose parents’ heights are the same? Report both the estimate and a 95% confidence interval for the estimate.

d. Compute a 95% Prediction Interval for a female whose father’s height is 73 inches and whose mother’s height is 67 inches.

Data Set:

Explanation / Answer

a. Convert the categorical variable gender to an indicator variable which takes on the value 0 if the gender is female and the value 1 if the gender is male.

We have used R programming to perform the problems.

I have loaded the dataset into galton dataframe.

> head(galton)
Gender Family Height Father Mother
1 male 1 73.2 78.5 67.0
2 female 1 69.2 78.5 67.0
3 female 1 69.0 78.5 67.0
4 female 1 69.0 78.5 67.0
5 male 2 73.5 75.5 66.5
6 male 2 72.5 75.5 66.5

Convert the gender to categorical variable by running the below command.

galton$Gender = factor(galton$Gender, labels = c("0","1"))

> head(galton)
Gender Family Height Father Mother
1 1 1 73.2 78.5 67.0
2 0 1 69.2 78.5 67.0
3 0 1 69.0 78.5 67.0
4 0 1 69.0 78.5 67.0
5 1 2 73.5 75.5 66.5
6 1 2 72.5 75.5 66.5

b. Construct a linear regression model that we can use to estimate a child’s height from their mother’s height, their father’s height and their gender. Report the fitted model.

The regression model is created by running the below command.

> model = lm(Height ~ Father + Mother + Gender, data = galton)
> model

Call:
lm(formula = Height ~ Father + Mother + Gender, data = galton)

Coefficients:
(Intercept) Father Mother Gender1
16.4322 0.3934 0.3184 5.2190

The fitted regression model is,

Child Height = 16.4322 + 0.3934 Father's Height + 0.3184 Monther's Height + 5.219 Gender

c. By how much does a male’s mean height exceed a female’s mean height for children’s whose parents’ heights are the same? Report both the estimate and a 95% confidence interval for the estimate.

If parens height are same, the male's height (Gender = 1) wil be

Male Child Height = 16.4322 + 0.3934 Father's Height + 0.3184 Monther's Height + 5.219

If parens height are same, the female's height (Gender = 0) wil be

Female Child Height = 16.4322 + 0.3934 Father's Height + 0.3184 Monther's Height

So, male’s mean height exceed a female’s mean height by 5.219 inches for children’s whose parents’ heights are the same.

Summary of the model gives the standard error of coefficient of Gender = 0.14188

> summary(model)

Call:
lm(formula = Height ~ Father + Mother + Gender, data = galton)

Residuals:
Min 1Q Median 3Q Max
-9.5280 -1.4604 0.0996 1.4783 9.1161

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.43221 2.72802 6.023 2.46e-09 ***
Father 0.39339 0.02868 13.718 < 2e-16 ***
Mother 0.31840 0.03102 10.263 < 2e-16 ***
Gender1 5.21902 0.14188 36.784 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.165 on 929 degrees of freedom
Multiple R-squared: 0.6358,   Adjusted R-squared: 0.6346
F-statistic: 540.5 on 3 and 929 DF, p-value: < 2.2e-16

z value for 95% confidence interval is 1.96. So 95% confidence interval for difference in male's and female's height is

(5.219 - 1.96 * 0.14188, 5.219 + 1.96 * 0.14188)

= (4.940915, 5.497085)

d. Compute a 95% Prediction Interval for a female whose father’s height is 73 inches and whose mother’s height is 67 inches.

Store the given data in a new data frame

newdata = data.frame(Father=73, Mother=67, Gender = factor("0"))

Run the below command to get the 95% Prediction Interval for the given data.

predict(model, newdata, interval="predict")
fit lwr upr
66.48184 62.22072 70.74296

So, 95% Prediction Interval for the given data is

(62.22072, 70.74296)