Retrieval Models The goal of a retrieval model is to score and rank documents fo
ID: 3606752 • Letter: R
Question
Retrieval Models The goal of a retrieval model is to score and rank documents for a query. Different retrieval models make different assumptions about what makes a document more (or less) relevant than another. suppose you issue the query “lemur" to a search engine. And, suppose that documents D101 and D123 both contain the term "lemur" five times. Answer the following questions For parts (d) and (e), assume that every document has the same prior. (a) Would the inner product (with a binary representation) necessarily give both documents the same score? If not, what information would determine which document is scored higher? [4 points]Explanation / Answer
A.If we’re adopting a binary text representation, then the inner product is just the number of terms in common between the query and the document. Again, because we have a s ingle-term query which happens to occur twice in each document, each document would have a inner-product score of two.
C.The cosine similarity is basically the inner-product divided by the vector length of the query times the vector length of the document. The vector length of the document is the square root of the number of unique terms. So, the scores given to both documents could be different, if the number of unique terms in both documents were different. The document with fewer unique terms (note: not fewer term occurrences) would get a higher score.
D.The query-likelihood model scores documents based on the probability of the query given the document language model. For a s ingle-term query and assuming no linear interpolation, this results in the proportion of the text associated with the query term. In other words, the number of times the term occurs divided by the number of termoccurrences in the document. Because we don’t know the number of term-occurrences in each document, we cannot say for sure that both documents would get the same score. The document with fewer term occurrences would get a higher score.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.