Thanks for the help!! 2. Consider a document-term matrix, where fij is the frequ
ID: 3350912 • Letter: T
Question
Thanks for the help!!
2. Consider a document-term matrix, where fij is the frequency of the jth word (term) in the ith document and n is the number of docu- ts. Consid er the variable tra fij-fi . logn, i , gi where gy is the number of documents in w hich the jth term appears and is known as the document frequency of the term. This transfor- mation is known as the inverse document frequency transformation. (a) What is the effect of this transformation if a term occurs in one document? In every document? (v) /l mighit be: thu tis firaiom?Explanation / Answer
a)
If a term occurs in one document, the value of log n / gj will be the maximum value and the value of fij' will be high, which means that the ith term is a significant term (a rare term) that impacts the similarity measure between documents. If a term occurs in every document, the value of log n / gj will be zero, which means that the ith term is a common term that can be found in every document, so we don’t want the value of this term to impact the similarity value.
b.
ANS The purpose of this transformation is to weight the important of each term. By using the inverse document frequency, we can automatically eliminate common terms (the terms that occur in every document ( = log n / gj 0 ), such as “to”, “is”, “the”) from the similarity calculation
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.