Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

We will build a naïve Bayes classifier based on the below weather dataset in ord

ID: 3842402 • Letter: W

Question

We will build a naïve Bayes classifier based on the below weather dataset in order to determine whether to play golf or not. There are four categorical attributes (outlook, temperature, humidity, windy) and one binary target (play).

outlook

temperature

humidity

windy

play

sunny

hot

high

false

no

sunny

hot

high

true

no

overcast

hot

high

false

yes

rainy

mild

high

false

yes

rainy

cool

normal

false

yes

rainy

cool

normal

true

no

overcast

cool

normal

true

yes

sunny

mild

high

false

no

sunny

cool

normal

false

yes

rainy

mild

normal

false

yes

sunny

mild

normal

true

yes

overcast

mild

high

true

yes

overcast

hot

normal

false

yes

rainy

mild

high

true

no

a) For a day when it is sunny, hot, windy with high humidity, please use naïve Bayes to predict if we should play golf or not.

b) Does the prediction agree with the classification provided in the training data set?

c) During a day which is overcast, hot, not windy with normal humidity, please apply Laplace method to predict if one should play golf or not?

d) Use the same training dataset (as in Problem 6) for golf playing and calculate the better splitting attribute (between OUTLOOK and HUMIDITY) to use as the first level attribute in constructing decision tree with Gini index.

outlook

temperature

humidity

windy

play

sunny

hot

high

false

no

sunny

hot

high

true

no

overcast

hot

high

false

yes

rainy

mild

high

false

yes

rainy

cool

normal

false

yes

rainy

cool

normal

true

no

overcast

cool

normal

true

yes

sunny

mild

high

false

no

sunny

cool

normal

false

yes

rainy

mild

normal

false

yes

sunny

mild

normal

true

yes

overcast

mild

high

true

yes

overcast

hot

normal

false

yes

rainy

mild

high

true

no

Explanation / Answer

For Question a)
Attributes for the 1st column outlook are Sunny, overcast and rainy
Attributes for the 2nd column temperature are hot mild cool
Attributes for the 3rd column humidity are high or normal
and Attributes for the 4th column Windy are true or false

So ,
For the 1st attribute Outlook
p(sunny/yes) =2/9 ,p(sunny/no)=3/5 , p(overcast/yes)=4/9 , p(overcast/no)=0 , p(rainy/yes)=3/9 , p(rainy/no)=2/5
Therefore ,
P(Yes)=9/14 and P(No)=5/14

For the 2nd Attribute Temperature
P(Hot/Yes)=2/9 , P(mild/Yes)=4/9 , P(cool/Yes)=3/9 , P(Hot/No)=2/5 , P(Mild/No)=2/5 , P(Cool/No)=1/5

For the 3rd Attribute Humidity
P(high/Yes)=3/9 , P(high/No)=4/5 , P(normal/Yes)=6/9 , P(normal/No)=2/5

For the 4th Attribute Windy
P(true/Yes)=3/9 , P(false/Yes)=3/9 , P(true/no)=3/5 , P(false/No)=2/5

Now, for the day when it is sunny, hot, windy and highly humid Say, D<sunny, hot, windy and highly>

P(D/Yes).P(Yes) = p(sunny/yes) . P(Hot/Yes). P(high/Yes). P(windy/Yes) .P(Yes)

= 2/9 .2/9 . 3/9. 3/9 . 9/14 = 0.00352733686

and P(D/No).P(No) = p(sunny/no) . P(Hot/no). P(high/no). P(windy/no) .P(no)

= 3/5. 2/5. 4/5 . 3/5 . 5/14 = 0.04114285714

As we can see that P(D/No)>P(D/Yes) so , We should not play golf

Answer b)

Yes the prediction agree with the classification provided in the training data set

Answer c)

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote