Natural language processing using Python NLTK package, WILL RATE ASAP E-book: ht
ID: 672205 • Letter: N
Question
Natural language processing using Python NLTK package, WILL RATE ASAP
E-book: http://victoria.lviv.ua/html/fl5/NaturalLanguageProcessingWithPython.pdf
Question:
Complete the following subparts, in the order given:
1. Consider the sequence of words: Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. This is a
grammatically correct sentence, as explained at http://en.wikipedia.org/wiki/
Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo . Consider the tree diagram
presented on this Wikipedia page, and write down a suitable grammar. Normalize case to lowercase, to
simulate the problem that a listener has when hearing this sentence. Can you find other parses for this
sentence? How does the number of parse trees grow as the sentence gets longer? (More examples of these
sentences can be found at http://en.wikipedia.org/wiki/List_of_homophonous_phrases) .
2. You can modify the grammar in the recursive descent parser demo by selecting Edit Grammar in the Edit
menu. Change the first expansion production, namely NP -> Det N PP, to NP -> NP PP. Using the Step
button, try to build a parse tree. Describe what happens.
Explanation / Answer
def init_wfst(tokens, grammar): numtokens = len(tokens) wfst = [[None for i in range(numtokens+1)] for j in range(numtokens+1)] for i in range(numtokens): productions = grammar.productions(rhs=tokens[i]) wfst[i][i+1] = productions[0].lhs() return wfst def complete_wfst(wfst, tokens, grammar, trace=False): index = dict((p.rhs(), p.lhs()) for p in grammar.productions()) numtokens = len(tokens) for span in range(2, numtokens+1): for start in range(numtokens+1-span): end = start + span for mid in range(start+1, end): nt1, nt2 = wfst[start][mid], wfst[mid][end] if nt1 and nt2 and (nt1,nt2) in index: wfst[start][end] = index[(nt1,nt2)] if trace: print("[%s] %3s [%s] %3s [%s] ==> [%s] %3s [%s]" % (start, nt1, mid, nt2, end, start, index[(nt1,nt2)], end)) return wfst def display(wfst, tokens): print(' WFST ' + ' '.join(("%-4d" % i) for i in range(1, len(wfst)))) for i in range(len(wfst)-1): print("%d " % i, end=" ") for j in range(1, len(wfst)): print("%-4s" % (wfst[i][j] or '.'), end=" ") print() >>> tokens = "I shot an elephant in my pajamas".split() >>> wfst0 = init_wfst(tokens, groucho_grammar) >>> display(wfst0, tokens) WFST 1 2 3 4 5 6 7 0 NP . . . . . . 1 . V . . . . . 2 . . Det . . . . 3 . . . N . . . 4 . . . . P . . 5 . . . . . Det . 6 . . . . . . N >>> wfst1 = complete_wfst(wfst0, tokens, groucho_grammar) >>> display(wfst1, tokens) WFST 1 2 3 4 5 6 7 0 NP . . S . . S 1 . V . VP . . VP 2 . . Det NP . . . 3 . . . N . . . 4 . . . . P . PP 5 . . . . . Det NP 6 . . . . . . N
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.