We are interested in using the following document-term matrix and the associated
ID: 3783778 • Letter: W
Question
We are interested in using the following document-term matrix and the associated relevance information as training data for a probabilistic retrieval model. A 1 entry indicates that the term occurs in a document, and means it does not: R or NR indicate the relevance of the document with respect to queries in the training data. Using the basic probabilistic retrieval model, compute the relevance and non-relevance probabilities associated with terms T1 through T6 (show these probabilities in a table). Then, using these probabilities and the given query Q = (1, 1, 0, 1, 0, 1), compute the discriminant Disc(Q, D11) and Disc(Q, D12) for each of the two new documents: D11 = (0. 1, 1, 0, 0, 1 D12 = (1, 0, 1, 1, 0, 1) Based on the discriminants, should these documents be retrieved? Explain your answer.Explanation / Answer
Types of Random Variables
Completely determined by domain (types of output)
Discrete: RV values = finite or countable
ex: coin tossing, dice-rolling, counts, words in a language
additivity:
P(X = x) is a sensible concept
Continuous: RV values are real numbers
ex: distances, times, parameter values for IR models
additivity:
P(X = x) is always zero, p(x) is a “density” function
Singular RVs … never see them in IR
x p x dx=1
x
P x =1
Conditional Probabilities
P(A | B) … probability of event A happening
assuming we know B happened
Example:
population size: 10,000,000
number of scientists: 10,000
Nobel prize winners: 10 (1 is an engineer)
P(scientist) = 0.001
P(scientist | Nobel prize) = 0.9
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.