The file contains the skeleton of a Python program to do a simple analysis of a

ID: 663571 • Letter: T

Question

The file contains the skeleton of a Python program to do a simple analysis of a text file: it will display the number of unique words which appear in the file, along with the number of times each word appears. Case does not matter: the words “pumpkin”, “Pumpkin” and “PUMPKIN” should be treated as the same word.

Execute the program (which currently uses “document1.txt” as the data file) and inspect the output. (This is the barebones File)

a. Replace each of the lines labeled “YOUR COMMENT” with meaningful comments to describe the work being done in the next block of statements. Use more than one comment line, if necessary.

b. Add doc strings to each function to describe the work being done in the function.

c. The program currently processes the empty string as a word. Revise the program to exclude empty strings from the collection of words.

d. The program currently processes words such as “The” and “the” as different words. Revise the program to ignore case when processing words.

e. The program currently always uses “document1.txt” as the input file. Revise the program to prompt the user for the name of the input file.

f. The program displays the words sorted by frequency of occurrence. Revise the program to also display the words sorted alphabetically.

g. Revise the program to display the collection of words sorted by greatest frequency of occurrence to least frequency, and sorted alphabetically for words with the same frequency count. Hint: since the “sorted” function and the “sort” method are stable sorts, you can first sort the words alphabetically, then sort them by reverse frequency.

h. Test the revised program. There are two sample documents available: “document1.txt” (The Declaration of Independence) and “document2.txt” (The Gettysburg Address).

Explanation / Answer

import string
from operator import itemgetter

def add_word(word_map, word):
# reducin each character in the word to its lower case
word = word.lower()

   # checking if word is already in word_map or not
   if word not in word_map:
       word_map[word] = 0

# increase the count of a word in word_map
word_map[word] += 1

def build_map(in_file, word_map):
   for line in file:
       # split the line into different token
       word_list = line.split()
       for word in word_list:
           # remove punctuation from the word
           word = word.strip().strip(string.punctuation)

           # adding the word in word_map
           add_word(word_map, word)

def display_map(word_map):
word_list = list()

   # append each word and its count from map to list
   for word, count in word_map.items():
       word_list.append((word, count))

# sorting the list word_list()
freq_list = sorted( word_list, key=itemgetter(1,0) )

print( " {:15s}{:5s}".format( "Word", "Count" ) )
print( "-"*20 )
for item in freq_list:
print( "{:15s}{:>5d}".format( item[0], item[1] ) )

def open_file(s):
   try:
       in_file = open(s,'r')
   except IOError:
       print("unable to open file "+s)
       in_file = None
   return in_file

def main():
   word_map = dict()
   print('Enter the file name ',end= " ")
   s = input()
   in_file = open_file(s)
   if in_file != None:
       build_map(in_file, word_map)
       display_map(word_map)
       in_file.close()

main()

Navigate

The file contains samples of 20 problems reported to two different offices of a

The file data includes the total compensation (in $) of CEO\'s of large public c

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

The file contains the skeleton of a Python program to do a simple analysis of a

Question

Explanation / Answer

Related Questions

Navigate