The following code counts how often different words occur in a text file. In thi
ID: 3719550 • Letter: T
Question
The following code counts how often different words occur in a text file. In this program, bstmap<Key,Data> has an interface and behavior identical to a std::map but is implemented using simple binary search trees.
Let W denote the number of words in the document file being processed.
Many of these words, in a typical document, will be duplicates. Let D denote the number of distinct (non-duplicate) words, D <= W. A study of several large text files has suggested that, for sufficiently large W, authors will use an average vocabulary of D=c*log(W) distinct words, where c is some constant.
Assume the length of any individual word is bounded by some constant (e.g., no single word is longer than "antidisestablishmentarianism").
What is the average-case complexity for countWordsInFile?
Explanation / Answer
char toLC (char c)
{
return (c >= 'A' && c <= 'Z') ? c-'A'+'a' : c; ===> O(1)
}
void countWordsInFile (istream& inFile, bstmap<string,int>& counts)
{
string word;
while (inFile >> word) ===> O(W)
{
// strip away any non-alphabetic characters
string::size_type n = word.find_first_not_of (wordCharacters); ====> O(K : K is the size of word antidisestablishmentarianism)
if (n != string::npos) =====> O(1)
word = word.substr (0, n); ======> Worst case : O(K)
if (word.length() > 0) =====> O(1)
{
// if there's anything left, count it
word = transform(word.begin(), word.end(), word.begin(), toLC); ====> O(K)
/** ... increment the appropriate counter in counts ... **/
map<string,int>::iterator pos = counts.find (word); ====> O(log(c*log(W)))
if (pos == counts.end()) // if this is the 1st time we've seen this word. ====> O(1)
counts.insert (map<string,int>::value_type(word, 1)); ====> O(log(c*log(W))
else // else if we've seen this word before
++((*pos).second); ====> O(1)
}
}
}
Hence, the average time complexity of this algorithm is
= O(W) * O(K + K + 2*log(c*log(W)))
= O(W * (K + log(c*log(W)))
where, K = length of any individual word
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.