how to write algorithm using psuedo-code for this : Given an article such as thi
ID: 3843659 • Letter: H
Question
how to write algorithm using psuedo-code for this :
Given an article such as this one at nytimes.com, design an algorithm to find the top 150 most frequently co-occurring word-pairs in this article. Two words are said to co-occur if they appear in the same sentence. For example, the last sentence in this article "It's really a milestone in Chinese science fiction." contain the following word pairs: ('It's', 'really') ('It's', 'a')('it's', 'milestone') (, 'Chinese') ('It's', 'science') ('It's', 'fiction') Creakily^, 'milestone')C really!.:, ^. 'science') (.'.really.', 'fictions'^', ' milestone')^, 'in'), 'Chinese') 'science')a', 'fiction')(milestone', milestone) ('a', in)('a' Chinese') ('a, science') (milestone' 'fiction') ('in', Chinese') ('in', science') ('in, fiction') ('Chinese, ' 'science') ('Chinese', 'fiction') ('science, ' 'fiction') you can assume you have access to a subroutine, sentenceSplitter(article), that can accurately segment an article into separate sentences and return these sentences in an array-like structure. You can also assume that you have access to another routine tokenizer(sentence), that can accurately identify the individual words contained in the input sentence and return these words in another array-like data structure.Explanation / Answer
//declare a Hashmap with key as a set of two words (wordPair) and value as frequency
//Everytime we get a new word pair, we just insert it into the map
//Everytime we a repeating word pair we just increase its frequency
//so in the end we can sort the map by value part and we will have a list of cooccuring pairs.
declare map< wordPair, freq> wordPairMap
sentenceSpiller (article)
return sentences_vector
tokenizer(sentence)
return words_vector
main()
string article
sentence_vector = sentenceSpiller(article);
//iterate through the sentence_vector
for each sentence : sentence_vector
word_vector = tokenizer(sentence)
for i = 0 to word_vector.size
for j= i + 1 to word_vector.size
word_pair = word[i] +","+ word[j])
//if map contains the word_pair key just increment its frequency
if wordPairMap contains word_pair
wordPairMap[word_pair].freq++
else
//otherwise insert a new entry in map with frequency 1
wordPairMap.push(word_pair, freq)
//sort the wordPairMap by value
wordPairMap.sortByValue()
Print the wordPairMap
for each pair : wordPairMap
Print( pair.Key )
end
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.