Each year the EPA does an analysis on the current models of vehicles sold in the
ID: 3202667 • Letter: E
Question
Each year the EPA does an analysis on the current models of vehicles sold in the United States. The data provided in the data set EpaFE2017Data.csv is a subset of this analysis, if you are curious you may access the full data set from the EPA website http://www.fueleconomy.gov/feg/download.shtml.
Using the data set EpaFE2017Data.csv and R code provided under the Data Analysis 1 tab obtain descriptive statistics and graphical displays for the variable estimated combined fuel efficiency (CombFE).
Stem plot:
Histogram:
Box plot:
b. Is the stem plot a reasonable visual display for this data? Why or why not?
c. What does the box represent in the boxplot?
d. Give a table that includes the mean, standard deviation, minimum, 1st quartile, median, 3rd quartile, maximum and IQR.
e. Given the shape of the data which measure, the mean, median or either, would be a more appropriate to represent the center of the data? Explain your reasoning.
f. Which specific vehicles have the best and worst fuel efficiency?
Please show all work. Thank you!
The decimal point is at the l 12 00000000000000 14 l 00000000000000000000000000000 16 l 0000000000000000000000000000000000000000000000000000000000 0000000000 18 l 00000000000000000000000000000000000000000000000000000000000000 000 000+42 20 000000000000000000000000000000000000000000000000 00000000000000000000 35 22 000000000000000000000000000000000000 00000000000000000000000000000000 56 24 l 000000000000000000000000000000000000000000000000 00000000000000000000 45 26 l 000000000000000000000000000000000000000000000000 00000000000000000000-1 28 000000000000000000000000000000000000000000000000 0000000 30 l 00000000000000000000000000000000000 32 000000000000000000000 34 I 0000000000000000 36 I 0000 38 I 00 40 00000 42 0000 44 I 46 I 0 48 I 0 50 I 52 I 0 54 I 56 I 0Explanation / Answer
Answer to part b)
No the step plot is not a reasonable visual display for this data because it fails to highlight the 5 point summary indicating whether the data is normally distributed or skewed, plus it does not help you know the outliers as well
.
Answer to part c)
The box in the box plot represents, the middle 50% of the data , from quartile 1 to quartile 3, in between it shows the median
Wider the box, wider is the inter quartile range of the data, thus it helps indicate the IQR.
.
Answer to part d)
For mean the formula to be used is:
=average(range of cells)
.
for standard deviation the formula is:
=stdev(range of cells)
.
For minimum, the formula is
=min(range of cells)
.
For quartile 1st, the formula is:
=quartile(range of cells,1)
.
For median
=quartile(range of cells, 2)
.
For 3rd quartile
=quartile(range of cells, 3)
.
for Maximum
=max(range of cells)
.
for IQR
=quartile(range of cells,3)-quartile(range of cells,1)
.
Answer to part e)
the best measure of central tendency would be median, becuase the graph shows that the data is skewed to the left , and for skewed data, median is a better measure of central tendency than mean
Thus median is the best measure for this data
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.