Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Statistical analysis of data: Means vs. medians Scientists commonly work with la

ID: 80945 • Letter: S

Question

Statistical analysis of data: Means vs. medians Scientists commonly work with large data sets comprised of many individual data points. It is often necessary to present the "center" or "middle" of the distribution of their data set to facilitate comparisons between experimental groups. These are commonly shown with means and medians: The mean of the data set is calculated by adding all the values together and dividing by the number of values. The mean score on an exam, for example, indicates the total combined scores of all students divided by the number of students. (The mean is also sometimes called the average.) The median is the value that occurs in the middle of the data set when the points are ranked from lowest to highest. The graphs from Donato's study used medians to compare conifer regeneration and woody fuels at the study sites. Sometimes using the median is more appropriate than using the mean, such as when outliers are present. Outliers are values that fall well outside the range of the other values in the data set. Because of the way means are calculated, they are heavily influenced by outliers. Medians, on the other hand, are less affected by outliers because they fall in the middle of the data set regardless of the magnitude of the largest and smallest values. Scientists have statistical tools that help them mathematically determine if a given value is truly an outlier, but potential outliers can be identified by simply scanning the data and looking for one or more values that are significantly higher or lower than the other values in the data set. Imagine replicating a smaller-scale version of Donato's study in a recently burned forest in your area. Following his team's methods, you collect the measurements below of conifer regeneration on burned-only and burned-and-logged sites. Use the data table to find values for medians and outliers. Enter values as they appear in the table to complete the sentences. The median of the burned-only sites is seedlings/ha. A possible outlier for the burned-only data set is seedlings/ha. The median of the burned-and-logged sites is seedlings/ha. A possible outlier for the burned-and-logged data set is seedlings/ha.

Explanation / Answer

The median value of a data set can be easily identified by arranging them in either ascending order or descending order. Median is the central value if there are odd numbers and it is the average of two central values in case of even numbers.

Outliers are those values which lie 1.5 times outside the inter-quartile range (IQR). Inter-quartile range is difference between first quarter (Q1) and third quarter (Q3).

Arranging the data of burned only sites in ascending order,

110, 650, 670, 690, 700, 730, 750, 770, 800, 840, 910

Median = 730 (central value)

First quartile, Q1 = 670, Third quartile, Q3 = 800

Inter-quartile range (IQR) = (Q3 - Q1) = 800 - 670 = 130

Outliers = values lying beyond (1.5 X IQR) of either Q1 or Q3

= 110 lies beyond this value.

110 is an outlier here.

Considering the data of burned and logged sites,

Arranging in ascending order,

120, 140, 170, 180, 200, 210, 230, 250, 280, 300, 570

Median = 210 (central value)

First quartile (Q1) = 170, Third quartile (Q3) = 280

Inter-quartile range (IQR) = Q3 - Q1 = 110

Outliers = values lying beyond (1.5 X IQR) of either Q1 or Q3

= 570 lies beyond this value

570 is an outlier here

1) The median of the burned only site is 730

2) A possible outlier for the burned only data set is 110

3) The median of the burned and logged site is 210

4) A possible outlier for the burned and logged data set is 570

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote