Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Create an R script to answer exercise 9. Use R comments (i.e., using \"#\") to s

ID: 3745054 • Letter: C

Question

Create an R script to answer exercise 9. Use R comments (i.e., using "#") to summarize your answers obtained from the corresponding R commands.

9. This exercise involves the Auto data set studied in the lab. Make sure that the missing values have been removed from the data. (a) Which of the predictors are quantitative, and which are quali- b) What is the range of each quantitative predictor? You can an- (c) What is the mean and standard deviation of each quantitative tative? swer this using the range) function. range) predictor? (d) Now remove the 10th through 85th observations. What is the range, mean, and standard deviation of each predictor in the subset of the data that remains? (e) Using the full data set, investigate the predictors graphically, or other tools of your choice. Create some plots highlighting the relationships among the predictors. Comment using scatterplots on your findin gs. (f) Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables. Do your plots suggest that any of the other variables might be useful in predicting mpg? Justify your answer

Explanation / Answer

9. This exercise involves the Auto data set studied in the lab. Make sure that the missing values have been removed from the data.

Solution)

# First read "Auto.csv" file using read.csv()

auto=read.csv("Auto.csv",head=T)

# dimensions

dim(auto)

# create data frame with missing values removed(using na)

auto=na.omit(auto)

# now see dimentions after remove missing values

dim(auto)

(a) Which of the predictors are quantitative, and which are qualitative?

Solution)

# convert origin to factor

auto$origin=as.factor(auto$origin)

# create factor version of cylinder and merge

cylinders=as.factor(auto$cylinder)

auto=data.frame(auto,cylinders)

rm(cylinders) # remove cylinder factor

# Rename integer version of cylinder

auto$cylinders.int=auto$cylinders

# Drop old version of cylinder

auto=subset(auto, select = -cylinders)

# Convert horsepower to numeric

auto$horsepower=as.numeric(as.character(auto$horsepower)) # convert to character

(b) What is the range of each quantitative predictor? You can answer this using the range() function.

Solution)

# Assign temp(data.frame())

temp=auto[, !sapply(auto, is.factor)] # variables are not factors

# Use apply

temp=t(apply(temp, 2, function(x) range(x))) # 2x7 transpose matrix

# Add column names Min and Max

colnames(temp)=c("Min", "Max")

# Round to two digits

round(temp, digits = 2)

rm(temp) # remove temp

(c) What is the mean and standard deviation of each quantitative predictor?

Solution)

# Assign temp(data.frame())

temp=auto[, !sapply(auto, is.factor)] # no factors variables

# Use apply

temp=t(apply(temp, 2, function(x) c(mean(x), sd(x)))) # 2x7 transpose matrix

# Add column names Mean and Std. Deviation

colnames(temp)=c("Mean", "Std. Deviation")

# Round to two digits

round(temp, digits = 2)

rm(temp) # remove temp

(d) Now remove the 10th through 85th observations. What is the range, mean, and standard deviation of each predictor in the subset of the data that remains?

Solution)

# sorted list

auto=auto[order(as.numeric(row.names(auto))), ]

# Remove 10th through 85th observation (using rm())

auto.rm=auto[-c(10:85), ]

# Assign temp(data.frame())

temp=auto.rm[, !sapply(auto.rm, is.factor)] # no factor variables

# Use apply()

temp=t(apply(temp, 2, function(x) c(range(x), mean(x), sd(x)))) # 4x7 transpose matrix

# Add column names Min, Max, Mean and Std. Deviation

colnames(temp)=c("Min", "Max", "Mean", "Std. Deviation")

# Round to two digits

round(temp, digits = 2)

rm(temp) # remove temp

(e) Using the full data set, investigate the predictors graphically, using scatterplots or other tools of your choice. Create some plots highlighting the relationships among the predictors. Comment on your findings.

Solution)

# Assign temp(data.frame())

temp=auto[, !sapply(auto, is.factor)] # no factor variables

# Scatterplot matrix of non-factor variables

pairs(temp, main = "Scatterplot Matrix: Non-factor Variables of 'Auto.csv'")

par(mfcol = c(2, 2))

# Create histograms

for (i in 1:ncol(temp)) {

hist(temp[, i], col = "beige",

main = paste("Histogram of auto$", names(temp)[i], sep = ""),

xlab = paste("auto$", names(temp)[i], sep = ""))

}

par(mfcol = c(1, 1))

par(mfcol = c(2, 2))

# Create boxplots

for (i in 1:ncol(temp)) {

boxplot(temp[, i], col = "beige",

main = paste("Boxplot of auto$", names(temp)[i], sep = ""),

ylab = paste("auto$", names(temp)[i], sep = ""))

}

par(mfcol = c(1, 1))

# Remove temp

rm(temp)

(f) Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables. Do your plots suggest that any of the other variables might be useful in predicting mpg? Justify your answer.

Solution)

# Examine correlation between scatterplot variables

sapply(auto[, !sapply(auto, is.factor)], function(x) cor(auto$mpg, x))

auto.rm <- auto[-c(10:85), ]

temp <- NULL

for (i in 1:ncol(auto)) {

if(is.factor(auto[, i]) == F) {

temp=rbind(temp, data.frame(colnames(auto.rm[i]),

round(min(auto.rm[, i]), digits = 2),

round(max(auto.rm[, i]), digits = 2),

round(mean(auto.rm[, i]), digits = 2),

round(sd(auto.rm[, i]), digits = 2)))

}

}

colnames(temp)=c("Variable", "Min", "Max", "Mean", "Standard Deviation")

temp

rm(temp)

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote