The file contains the skeleton of a Python program to do a simple analysis of a
ID: 663571 • Letter: T
Question
The file contains the skeleton of a Python program to do a simple analysis of a text file: it will display the number of unique words which appear in the file, along with the number of times each word appears. Case does not matter: the words “pumpkin”, “Pumpkin” and “PUMPKIN” should be treated as the same word.
Execute the program (which currently uses “document1.txt” as the data file) and inspect the output. (This is the barebones File)
a. Replace each of the lines labeled “YOUR COMMENT” with meaningful comments to describe the work being done in the next block of statements. Use more than one comment line, if necessary.
b. Add doc strings to each function to describe the work being done in the function.
c. The program currently processes the empty string as a word. Revise the program to exclude empty strings from the collection of words.
d. The program currently processes words such as “The” and “the” as different words. Revise the program to ignore case when processing words.
e. The program currently always uses “document1.txt” as the input file. Revise the program to prompt the user for the name of the input file.
f. The program displays the words sorted by frequency of occurrence. Revise the program to also display the words sorted alphabetically.
g. Revise the program to display the collection of words sorted by greatest frequency of occurrence to least frequency, and sorted alphabetically for words with the same frequency count. Hint: since the “sorted” function and the “sort” method are stable sorts, you can first sort the words alphabetically, then sort them by reverse frequency.
h. Test the revised program. There are two sample documents available: “document1.txt” (The Declaration of Independence) and “document2.txt” (The Gettysburg Address).
Explanation / Answer
import string
from operator import itemgetter
def add_word(word_map, word):
# reducin each character in the word to its lower case
word = word.lower()
# checking if word is already in word_map or not
if word not in word_map:
word_map[word] = 0
# increase the count of a word in word_map
word_map[word] += 1
def build_map(in_file, word_map):
for line in file:
# split the line into different token
word_list = line.split()
for word in word_list:
# remove punctuation from the word
word = word.strip().strip(string.punctuation)
# adding the word in word_map
add_word(word_map, word)
def display_map(word_map):
word_list = list()
# append each word and its count from map to list
for word, count in word_map.items():
word_list.append((word, count))
# sorting the list word_list()
freq_list = sorted( word_list, key=itemgetter(1,0) )
print( " {:15s}{:5s}".format( "Word", "Count" ) )
print( "-"*20 )
for item in freq_list:
print( "{:15s}{:>5d}".format( item[0], item[1] ) )
def open_file(s):
try:
in_file = open(s,'r')
except IOError:
print("unable to open file "+s)
in_file = None
return in_file
def main():
word_map = dict()
print('Enter the file name ',end= " ")
s = input()
in_file = open_file(s)
if in_file != None:
build_map(in_file, word_map)
display_map(word_map)
in_file.close()
main()
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.