Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

2) (20 Points) Each year the EPA does an analysis on a large sample of current v

ID: 2947868 • Letter: 2

Question

2) (20 Points) Each year the EPA does an analysis on a large sample of current vehicles models sold in the United States. The data provided in the data set EpaFE2017Data.csv is a subset of this analysis, if you are curious you may access the full data set from the EPA website //www.fuele Using the data set EpaFE2017Data.csv and R code provided under the Data Analysis 2 tab obtain descriptive statistics and graphical displays for the variable combined Carbon Dioxide Emissions in grams per mile. (CombCO2). a) (4 points) Include (copy and paste from R) the stemplot, histogram, and boxplot. Which do you feel best summarizes the data and why? b) (1 point) Is the stem plot a reasonable visual display for this data? Why or why not? c) (1 point) What does the box represent in the boxplot? d) (3 points) Give a table that includes the mean, standard deviation, minimum, 1st quartile, median, 3rd e) (5 points) Use the plots to thoroughly describe the data in the context of the problem. Include the f (2 points) Given the shape of the data which measure, the mean, median or either, would be a more g) (1 point) Which specific vehicles have the lowest (best) and highest (worst) Carbon Dioxide quartile, maximum and IQR. shape, center and spread in your description. State whether there are any outliers. appropriate to represent the center of the data? Explain your reasoning. Emissions?

Explanation / Answer

cardata=read.csv(file.choose(), header=TRUE)
head(cardata)
stem(cardata$Comb.CO2)
hist(cardata$Comb.CO2,main="EPA Estiamted Combined CO2 Emissions of 2017 Vehicles",
col="darkblue", xlab="Grams of CO2 per mile")
boxplot(cardata$Comb.CO2,main="EPA Estiamted Combined CO2 Emissions of 2017 Vehicles",
col="darkblue", ylab="Grams of CO2 per mile", horizontal=TRUE)

(a)

The decimal point is 2 digit(s) to the right of the |

1 | 566788899
2 | 1111222222223344
2 | 55555555566666666666666666677777777777778888888888888888888888888888+25
3 | 00000000000000000000000000000000000111111111111111111111111111111111+127
3 | 55555555555555555555555555555555555555555555555555555555555556666666+203
4 | 00000000000000000000000000000000000000000000000000000000000001111111+154
4 | 55555555555555555555555555555555555555666666666666666666666666666666+103
5 | 00000000000000000000000000000011111111111112222222222222222222222222+24
5 | 555555555555555555555555666666666777777777788899999999
6 | 000000011111222333344444
6 | 5566677889
7 | 00000001222
7 | 556
8 | 3

boxplot is a best suitable plot in this data set because box plot provide more of a summary of a distribution can also be seen as an advantage in certain cases.

2. No, stem plot is not a reasonable visualize display for this data set becuase might be messy after having too much data.

3. The box in boxplot show that It represents the values of the middle 50% of the population i.e. 25% will have a value lower than the "box" and 25% will be higher.

4.

> xx=c(mean(cardata$Comb.CO2),sd(cardata$Comb.CO2), summary(cardata$Comb.CO2), IQR(cardata$Comb.CO2))
> xx
   Mean       Sd               Min.       1st Qu.     Median       Mean      3rd Qu.      Max.         IQR
405.43650 96.55533 154.00000 336.80000 395.00000 405.40000 466.20000 829.00000 129.50000

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote