Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

no copy Briefly explain, why we use Data Mining, specifically on large data sets

ID: 3799957 • Letter: N

Question

no copy Briefly explain, why we use Data Mining, specifically on large data sets (big data)?

List and briefly explain major data mining tasks with examples?

Explain different approaches to handle the problem of missing values of attributes while data cleaning.

Explain each of the following characteristics about the data warehouse mentioned in its definition:

“A data warehouse is a (1) subject-oriented, (2) integrated, (3) time-variant, and (4) nonvolatile collection of data in support of (5) management’s decision-making process”

Explanation / Answer

Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data. The information or knowledge extracted so can be used for any of the following applications

    Market Analysis

    Fraud Detection

    Customer Retention

    Production Control

    Science Exploration

This is usually done when the class label is missing (assuming your data mining goal is classification), or many attributes are missing from the row (not just one). However, you’ll obviously get poor performance if the percentage of such rows is high.

For example, let’s say we have a database of students enrolment data (age, SAT score, state of residence, etc.) and a column classifying their success in college to “Low”, “Medium” and “High”. Let’s say our goal is to build a model predicting a student’s success in college. Data rows who are missing the success column are not useful in predicting success so they could very well be ignored and removed before running the algorithm.