Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Python 1. Creating the word dictionary [Coding only: save code as problem1.py ]

ID: 3852912 • Letter: P

Question

Python

1. Creating the word dictionary [Coding only: save code as problem1.py ] The first step in building an n-gram model is to create a dictionary that maps words to java map or python dictionary (which we’ll use to access the elements corresponding to that word in a vector or matrix of counts or probabilities). You’ll create this dictionary from the given data files (Select one file for training purpose) for all unique words. You’ll need to split the sentences (consider each line) into a list of words and convert each word to lowercase, before storing it to the dictionary.

For example, I have a text file abc.txt

Explanation / Answer

NOTE: The question is not clear to me to understand what actually you require. After reading question multiple times it looks like you want to create a dictionary of words from a file which i did and below is the code. If this is not the expected answer, please mention clearly about your requirement along with example. I will revert back within 24 hours.

Code:

#!/usr/bin/python

import sys

# Program to get unique words from a given file
# script name : uniq_word.py

def main():
   # Taking filename from command line and validating the arguments
   if(len(sys.argv) != 2):
       print "One argument must be passed and is a filename. Usage: " + sys.argv[0] + " <filename>"
       exit(0)

   # taking filename into the variable filename
   filename = sys.argv[1]
   word_dict = {}

   # opening the filename in read mode
   with open(filename, "r") as fp:
       # iterating through each line in file
       for line in fp:
           # splitting the words in each line
           words = line.split() # by default space is used as a delimiter to split words in a line

           # Iterating through each word to store in dictionary
           for word in words:
               # if we are seeing a word first time assign the value 1
               word = word.lower()
               if not word in word_dict:
                   word_dict[word] = 1
               # if the word already exists in dictionary increment the frequency of word
               else:
                   word_dict[word] += 1
   # closing the file
   fp.close()
   # Iterating through word dictionary and printing the unique words
   for word in word_dict:
       print word + "=" + str(word_dict[word])  

if __name__=='__main__':
   main()

Execution and output:
Unix Terminal> cat test
this is a program file to test word_length
hello how are you
hello world is the first program
Unix Terminal> python uniq_word.py test
a=1
the=1
how=1
this=1
is=2
to=1
program=2
word_length=1
are=1
file=1
test=1
world=1
you=1
hello=2
first=1

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Chat Now And Get Quote