In dealing with large data sets, addressing missing values is an important step.
ID: 3352686 • Letter: I
Question
In dealing with large data sets, addressing missing values is an important step. But, some datasets contain variables that have a large amount of missing values. In other words, several rows of the dataset have missing values. In such cases, dropping the variable with missing values will lead to a loss of significant data. Imputing the missing values might also be useless, as these imputations will be based on a small number of records. In such cases, what alternatives can you suggest when modeling from such data?
Explanation / Answer
General steps for analysis with missing data:
1) Identify patterns / reasons for missing and recode correctly.
2) Understand distribution of missing data.
3) Decide on best method of analysis.
Understanding of data :
. Attrition due to social/natural processes.
.Skip pattern in survey.
.Intentional missing as part of data collection process.
. Random data collection issues.
. Respondent refusal / Non-response.
Missing data mechanism or Probability distribution of missingness:
Consider the Probability of missingness.
Are certain groups more likely to have missing values
Are certain responses more likely to missing?
Missing data mechanism:
1)Missing completely at random:
Missing value neither depends on any value.
2) Missing at random:
Missing value depends on value.
3) Missing not at Random:
The Probability of a missing value depends on the variable that is missing.
Exploring missing data mechanism:
Can't be sure about Probability of missing values.
Some methods we can use, some are:
Selection model.
Pattern mixture models.
Deal with missing data:
Use what you Know about.
Decide on the best analysis strategy to yield the least biased estimates :
Deletion Methods.
Single imputation methods.
Model bases methods.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.