An important practice is to check the validity of any data set that you analyze.
ID: 3044837 • Letter: A
Question
An important practice is to check the validity of any data set that you analyze. One goal is to detect typos in the data, and another would be to detect faulty measurements. Recall that outliers are observations with values outside the “normal” range of values of the rest of the observations.
Specify a large population that you might want to study and describe the type numeric measurement that you will collect (examples: a count of things, the height of people, a score on a survey, the weight of something). What would you do if you found a couple outliers in a sample of size 100? What would you do if you found two values that were twice as big as the next highest value?
You may use examples from your area of interest, such as monthly sales levels of a product, file transfer times to different computer on a network, characteristics of people (height, time to run the 100 meter dash, statistics grades, etc.), trading volume on a stock exchange, or other such things. It is not required that the example is from your area of interest, that is just a suggestion.
There is no requirement to use sources from the Internet, but if you use an idea or a quotation from any source, it should be cited (such as putting the author and year at the end of the sentence and then adding a reference at the end to describe the source).
Explanation / Answer
Outliers are one of those statistical issues that everyone knows about, but most people aren’t sure how to deal with. Most parametric statistics, like means, standard deviations, and correlations, and every statistic based on these, are highly sensitive to outliers. And since the assumptions of common statistical procedures, like linear regression and ANOVA, are also based on these statistics, outliers can really mess up your analysis.
Despite all this, as much as you’d like to, it is NOT acceptable to drop an observation just because it is an outlier. They can be legitimate observations and are sometimes the most interesting ones. It’s important to investigate the nature of the outlier before deciding
If it is obvious that the outlier is due to incorrectly entered or measured data, you should drop the outlier:
For example, I once analyzed a data set in which a woman’s weight was recorded as 19 lbs. I knew that was physically impossible. Her true weight was probably 91, 119, or 190 lbs, but since I didn’t know which one, I dropped the outlier.
This also applies to a situation in which you know the datum did not accurately measure what you intended. For example, if you are testing people’s reaction times to an event, but you saw that the participant is not paying attention and randomly hitting the response key, you know it is not an accurate measurement.
If the outlier does not change the results but does affect assumptions, you may drop the outlier. But note that in a footnote of your paper.
More commonly, the outlier affects both results and assumptions. In this situation, it is not legitimate to simply drop the outlier. You may run the analysis both with and without it, but you should state in at least a footnote the dropping of any such data points and how the results changed
If the outlier creates a significant association, you should drop the outlier and should not report any significance from your analysis.
So in those cases where you shouldn’t drop the outlier, what do you do?
One option is to try a transformation. Square root and log transformations both pull in high numbers. This can make assumptions work better if the outlier is a dependent variable and can reduce the impact of a single point if the outlier is an independent variable.
Another option is to try a different model. This should be done with caution, but it may be that a non-linear model fits better. For example, in example 3, perhaps an exponential curve fits the data with the outlier intact.
Whichever approach you take, you need to know your data and your research area well. Try different approaches, and see which make theoretical sense.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.