Question 5 When or why should we use oversampling? When the costs of failing to
ID: 333888 • Letter: Q
Question
Question 5 When or why should we use oversampling? When the costs of failing to identify rare events are low. To de-emphasize rare events to the leaming algorithm. When a data set used for learning a predictive model for a binary response variable includes significantly more items with one choice of the response variable (eg.Y-1) than of the other choice (e.g. Y-0) and we seek to accurately predict both cholces when a data set used for learning a predictive model for a binary response variable includes roughly the same number of items for each choice Le Y-1 and Y= None of these are correct. of the response vanable Question 6 Which of the following is the most accurate statement about classification and prediction in the context of data mining, as these terms are used by the textbook authors? prediction refers to estimating values of categorical respanse variables. Classification refers to determining choices for numerical resposeriables, All of these are correct Classification refen Classifcation is sometimes referred to as regression. variables. pr ariables. 8 values of merical response assifncation is Prediction is syno nymous with classiication. Question 7 Which is an example of a business application of unsupervised learning? Classifying credit card transactions as valild or fraudulent. Categorizing customers as likely or unlikely to respond to a promotional offer. All of these are corect ldentifying products that customers are likely to purchase together. Predicting home sales prices in a nelghborhoodExplanation / Answer
Solution-
Question 5- Option C.
Oversampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).
Oversampling involves using a bias to select more samples from one class than from another.
The usual reason for oversampling is to correct for a bias in the original dataset. One scenario where it is useful is when training a classifier using labelled training data from a biased source, since labelled training data is valuable but often comes from un-representative sources.
For example, suppose we have a sample of 1000 people of which 66.7% are male. We know the general population is 50% female, and we may wish to adjust our dataset to represent this. Simple oversampling will select each female example twice, and this copying will produce a balanced dataset of 1333 samples with 50% female. Simple undersampling will drop some of the male samples at random to give a balanced dataset of 667 samples, again with 50% female.
Question 6- Option C. In the book "Data Mining Concepts and Techniques", Han and Kamber's view is that predicting class labels is classification, and predicting values (e.g. using regression techniques) is prediction.
Question 7- Option D. Unsupervised learning is one where we dont know how the output would look like. Since in all the case except D, we know the type of output they are supervised learning examples.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.