Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Problem 4. Suppose that a Bayesian spam filter is trained on a set of 1000 spam

ID: 3053984 • Letter: P

Question

Problem 4. Suppose that a Bayesian spam filter is trained on a set of 1000 spam messages and 250 messages that are not spam. The word "cruise" appears in 50 spam messages and in 2 messages that are not spam, while the word urgent" appears in 100 spam messages and in 10 messages that i. Would an incom both words "cruise" and "urgent" and the threshold for rejecting spam is 0.9? (Assume, for simplicity, that the message is equally likely to be spam as it is not to be spam and that the two words are used independently.) Provide detailed justifications for your an swers

Explanation / Answer

ANSWER:

Step 1: Set up events

A1 = email is spam,

A2 = email is good

B1 = email contains the word “CRUISE”    

B2 = message contains the word “URGENT”

Step 2: Identify probabilities

P(B1|A1) =50 /1000 = 0.05

   P(B1|A2) = 2/250 = 0.008

   P(B2|A1) = 100/1000 = 0.1

   P(B2|A2) = 10/250 = 0.04

   P[A|(B1 ? B2)] = P(B1|A)P(B2|A)/ P(B1|A)P(B2|A) + P(B1|A2)P(B2|A2

P[A | (B1 ? B2)] = 0.5(0.1)/ 0.5(0.1) + 0.008(0.04)

                             = 0.9936

Conclusion: Since the probability that our email is spam given that it contains the string “CRUISE” and “URGENT” is approximately 0.9936 > 0.9, we will flag this email as spam.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote