A simple e-mail spam filter uses Bayes\' theorem to predict whether a given mess
ID: 3329219 • Letter: A
Question
A simple e-mail spam filter uses Bayes' theorem to predict whether a given message is spam or not.
We have a "training" data set of 1,000 e-mails, 500 of which have been identified as spam by a human analyst. The phrase "Instant winner" appears in 460 of the spam messages, and 15 of the non-spam messages.
For the internet as a whole, 90% of all e-mails are estimated to be spam messages. If a new e-mail arrives containing the phrase "Instant winner", what is the probability that it is a spam message?
(Hint: use Bayes' theorem, and be careful when you calculate P("instant winner")! Our sample of 1,000 e-mails isn't representative of the fact that 90% of all e-mails are spam...)
Explanation / Answer
Ans:
P(spam)=500/1000=0.5
P(non spam)=1-0.5=0.5
P(instant inner/spam)=460/500=0.92
P(instant winner/non spam)=15/500=0.03
P(spam/instant winner)=P(instant winner/spam)*P(spam)/[P(instant winner/spam)*P(spam)+P(instant winner/non spam)*P(non spam)]
=0.92*0.5/[0.92*0.5+0.03*0.5]
=0.92/0.95
=0.9684
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.