--- title: \"Assignment 4\" output: word_document: default --- ## First review t
ID: 3714646 • Letter: #
Question
---
title: "Assignment 4"
output: word_document: default
---
## First review the assignment.
This assignment is to produce a predictive model of median household income using other variables in the countyComplete dataframe, which is in the openintro package. You may not include per capita income. I have included one model for discussion, but you need to create a model with different choices. You may construct new variables based on those included in countyComplete.
## Load the required libraries.
```{r}
library(tidyverse)
library(openintro)
library(broom)
```
## Problem 1
Run the commands glimpse() and summary() on the dataframe to understand the meaning of the variables and to verify the integrity of the data. The documentation in the openintro package is useful for understanding the variables. Avoid variables with many missing values when you construct your model.
```{r}
# Place your code here.
```
## Problem 2
Use lm() to create a model to predict median household income using 5 other variables. Do not include per capita income. Display a summary of the model.
```{r}
# Place your code here.
```
## Problem 3
Which 2 numbers in the summary output describe the overall performance of the model. Use these two numbers in appropriate sentences to describe how well your model performed.
Place your answer here.
## Problem 4
Examine the p-values for the individual coefficients of the model. Can you reject the hypothesis that the true coefficient value is zero in every case?
## Problem 5
Look at the signs of the coefficients. Do all of them have the signs that you would expect? Note any exceptions.
## Problem 6
Consider the forecasts of median household income for three different countys. Choose the counties you want. Use the augment() function from broom. Describe how the forecasts compare with the actuals for these counties.
```{r}
# Insert your code here.
```
Put your verbal answer here.
## Problem 7
First get a more convenient datframe, myvars, with only the variables you used.
```{r}
# Insert your code here.
```
Now look at the relationships between the pairs of variables in myvars.
```{r}
plot(myvars)
cor(myvars)
```
What do you see that might suggest a reformulation of your model.
Create the new model and produce a summary. Does the new model work better?
```{r}
# Insert your code here.
```
Explanation / Answer
---
title: "Assignment 4"
output: word_document: default
---
## First review the assignment.
This assignment is to produce a predictive model of median household income using other variables
in the countyComplete dataframe, which is in the openintro package. You may not include per capita income.
I have included one model for discussion, but you need to create a model with different choices.
You may construct new variables based on those included in countyComplete.
## Load the required libraries.
```{r}
library(tidyverse)
library(openintro)
library(broom)
```
## Problem 1
```{r}
df<-countyComplete
glimpse(df)
summary(df)
```
## Problem 2
Use lm() to create a model to predict median household income using 5 other variables.
Do not include per capita income. Display a summary of the model.
```{r}
fit<-lm(median_household_income~FIPS+female+foreign_spoken_at_home+mean_work_travel+density,data = df)
summary(fit)
```
## Problem 3
Which 2 numbers in the summary output describe the overall performance of the model.
Use these two numbers in appropriate sentences to describe how well your model performed.
Place your answer here.
## Problem 4
Examine the p-values for the individual coefficients of the model. Can you reject the hypothesis
that the true coefficient value is zero in every case?
Yes, We can reject the null hypotheis because p-value is less than 0.05
## Problem 5
Look at the signs of the coefficients. Do all of them have the signs that you would expect?
Note any exceptions.
## Problem 6
Consider the forecasts of median household income for three different countys.
Choose the counties you want. Use the augment() function from broom. Describe how the forecasts
compare with the actuals for these counties.
new<- data.frame(df$median_household_income,df$foreign_spoken_at_home,df$mean_work_travel,df$density)
x<-augment(fit,data = df,newdata =new ,type.predict = "response")
```{r}
# Insert your code here.
```
Put your verbal answer here.
## Problem 7
First get a more convenient datframe, myvars, with only the variables you used.
```{r}
myvars<-subset(df, select=c("median_household_income", "foreign_spoken_at_home","mean_work_travel","density"))
```
Now look at the relationships between the pairs of variables in myvars.
```{r}
plot(myvars)
cor(myvars)
```
What do you see that might suggest a reformulation of your model.
Create the new model and produce a summary. Does the new model work better?
```{r}
fit1<-lm(median_household_income~foreign_spoken_at_home+mean_work_travel+density,data = myvars)
summary(fit1)
```
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.