a) Compute the Gini index for the overall collection of the training examples. b
ID: 3758683 • Letter: A
Question
a) Compute the Gini index for the overall collection of the training examples.
b) Compute the Gini index for the Customer ID attribute.
c) Compute the Gini index for the Gender attribute.
d) Compute the Gini index for the car type attribute using multiway split.
e) Compute the Gini index for the Shirt Size attribute using multiway split.
f) Which attribute is better, Gender, Car Type, or Shirt Size?
g) Explain why customer ID should not be used as the attribute test condition even though it has the lowest Gini.
Explanation / Answer
answers
a) Gini = 1 2 × 0.52 = 0.5.
b) The gini for each Customer ID value is 0. Therefore, the overall gini for Customer ID is 0.
c) The gini for Male is 1 2 × 0.52 = 0.5. The gini for Female is also 0.5. Therefore, the overall gini for Gender is 0.5 × 0.5+0.5 × 0.5=0.5.
d) The gini for Family car is 0.375, Sports car is 0, and Luxury car is 0.2188. The overall gini is 0.1625.
e) The gini for Small shirt size is 0.48, Medium shirt size is 0.4898, Large shirt size is 0.5, and Extra Large shirt size is 0.5. The overall gini for Shirt Size attribute is 0.4914.
f) Car Type because it has the lowest gini among the three attributes.
g) The attribute has no predictive power since new customers are assigned to new Customer IDs.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.