Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

11. Below we see a set of 20 points and a decision tree for classifying the poin

ID: 3362668 • Letter: 1

Question

11. Below we see a set of 20 points and a decision tree for classifying the points. (31, 130 (25, 120) 08115) (33,115 alary Buy-Ne 50 :612, 32) (23, 40) (89, 30) 455,30) 30 To be precise, the 20 points represent (Age, Salary) pairs of people who do or do not buy House. Age is the x-axis, and Salary is the y-axis. Those that do are represented by green points, and those that do not by gold points. The 10 points of House buyers are: (23,40), (25,120), (29,97), (33.22), (35,63), (42,52), (44, 40), (55,63), (55,30), and (64,37). The 10 points of those that do not buy House are: (28,145), (38,115), (43,83), (51,130), (50,90), (50,60), (49,30), (55,118), (63,88), and (65, 140). Some of these points are correctly classified by the decision tree and some are not. Determine the classification of each point, and then indicate in the list below the point that is misclassified. a. (63, 88) b. (55, 63) c. (29, 97) d. (49, 30) 12. Consider the process of building a decision-tree-based classifier using Entropy as a measure of impurity associated with a tree node that represents a subset of training examples. A node is split into partitions represented by its child nodes based on the values of a selected attribute. The goodness of the attribute for the split, referred to as information gain of the attribute, is the difference between the impurity of the parent node and the weighted sum of the impurities of the child nodes. 7-5

Explanation / Answer

(11)

As per given decision tree,

Condition-1 : Do buy House if (Age<40, Salary>=100) or (Age>=40, Salary>=50)

   Condition-2 : Do not buy House if (Age<40, Salary<100) or (Age>=40, Salary<50)

Point a. (63, 88) fall in condition-1,

  Point b. (55, 63) fall in condition-1,

Point c. (29, 97) fall in condition-2,

Point d. (49, 30) fall in condition-2.

Hence,  Point a. (63, 88) and  Point c. (29, 97) are misclassified.

(12)

A1

A2

C1

C2

Big Data

Male

2

10

Database

Male

5

40

Data Mining

Male

7

35

Text Mining

Male

10

4

Big Data

Female

20

16

Database

Female

8

5

Data Mining

Female

28

29

Text Mining

Female

20

20

As per given data and decision tree,

Option-d : Splitting based on any of the attributes, A1 or A2, reduces the impurity existing in the given training set at least to some extent.

A1

A2

C1

C2

Big Data

Male

2

10

Database

Male

5

40

Data Mining

Male

7

35

Text Mining

Male

10

4

Big Data

Female

20

16

Database

Female

8

5

Data Mining

Female

28

29

Text Mining

Female

20

20

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Chat Now And Get Quote