Can anyone help me with this problem? Python. I have to create a dictionary of b
ID: 3853362 • Letter: C
Question
Can anyone help me with this problem? Python.
I have to create a dictionary of bigrams which means it should be in form [previous word, current word]
Building an MLE bigram model [Coding only: save code as problem2.py ]
Now, you’ll create an MLE bigram model, in much the same way as you created an MLE unigram model. I recommend writing the code again from scratch, however (except for the code initializing the mapping dictionary), so that you can test things as you go. The main differences between coding an MLE bigram model and a unigram model are: • Select an appropriate data structure to store bigrams. • You’ll increment counts for a combination of word and previous word. This means you’ll need to keep track of what the previous word was. • You will compute the probability of the current word based on the previous word count. Prob of curr word = count(prev word, curr word)/ count(previous word) Consider we observed the following word sequences:
finger remarked
finger on
finger on
finger in
finger .
Notice that "finger on " was observed twice. Also notice that the period is treated as a separate word. Given the information in this data structure, we can compute the probability p(on|finger) as 2/5 = 0.4. Similarly, we can compute the probability p(.|finger) as 1/5 = 0.2. When complete, add code to write 100 random (you can select a word and bigram term randomly) probabilities to bigram_probs.txt, one per line • p(on|finger) = 0.4 • p(.|finger) = 0.2
I also have the code for unigram model:
Explanation / Answer
Answer for Bigrams python code:
Written this below code using passing file name as a command line arguments:
See the below code:
import string
import sys
# complain if we didn't get a filename
# as a command line argument
if len(sys.argv) < 2:
print "Please enter the name of a corpus file as a command line argument."
sys.exit()
# try opening file
# If the file doesn't exist, catch the error
try:
f = open(sys.argv[1])
except IOError:
print "Sorry, I could not find the file", sys.argv[1]
print "Please try again."
sys.exit()
# read the contents of the whole file into ''filecontents''
filecontents = f.read()
# count bigrams
bigrams = {}
words_punct = filecontents.split()
# strip all punctuation at the beginning and end of words, and
# convert all words to lowercase
words = [ w.strip(string.punctuation).lower() for w in words_punct ]
# add special START, END tokens
words = ["START"] + words + ["END"]
for index, word in enumerate(words):
if index < len(words) - 1:
# we only look at indices up to the
# next-to-last word, as this is
# the last one at which a bigram starts
w1 = words[index]
w2 = words[index + 1]
# bigram is a tuple,
# like a list, but fixed.
# Tuples can be keys in a dictionary
bigram = (w1, w2)
if bigram in bigrams:
bigrams[ bigram ] = bigrams[ bigram ] + 1
else:
bigrams[ bigram ] = 1
# sort bigrams by their counts
sorted_bigrams = sorted(bigrams.items(), key = lambda pair:pair[1], reverse = True)
for bigram, count in sorted_bigrams
print bigram, ":", count
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.