Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

In Python 3, The link needed is ( http://www.cse.msu.edu/~cse231/Online/Labs/Lab

ID: 3849601 • Letter: I

Question

In Python 3,

The link needed is ( http://www.cse.msu.edu/~cse231/Online/Labs/Lab10/ )

Thank you!

Consider the file named "lab10a.py". That file contains the skeleton of a Python program to do a simple analysis of two files: it will display the number of unique words which appear in the two files (the union of those two sets of words), as well as the number of unique words which are common to both files (the intersection of those two sets of words) Case does not matter: the words "pumpkin Pumpkin" and "PUMPKIN" should be treated as the same word. Only unique words should be counted: if a word appears more than once in a file, it should only be counted once. Note: remember to remove punctuation from words, e "it," should be "it a. Replace the comments labeled "YOUR COMMENT" in function "build word set" with meaningful comments to describe the work being done in the next statement. Use more than one comment line, if necessary b. Revise function "compare files" to accomplish the work described in the comments. c. Test the revised program. There are two sample documents available: "document1.txt" (The Declaration of Independence) and "document2.txt" (The Gettysburg Address) Demonstrate your completed program to your TA. on-line students should submit the completed program (named "lab10a.py") for grading via the Mirmir system. Part B: Programming with Dictionaries and Sets Consider the file named "lab10b.py". That file contains the skeleton of a Python program to display information about the words in a document. Function "main" is complete. It handles the interaction with the user and calls other functions to perform the appropriate tasks Function "print word index" is complete. It receives a dictionary, where each element is a word and a set of line numbers where that word appears in a document. It displays all of the words (in alphabetic order), along with the lines numbers for each word (in ascending order)

Explanation / Answer

Completed script as required

lab10(a).py

import re
import string

def build_word_set( input_file ):
  
word_set = set()
  
for line in input_file:

# Following line is removing any trailing space with strip() function and splitting the line at spaces using split() function
word_lst = line.strip().split()

# Following line of code is converting the words in word_lst to lower case and then stripping each word with any punctuation in it
word_lst = [w.lower().strip(string.punctuation) for w in word_lst]
  
for word in word_lst:
  
if word != "":

# Adding the word to word_set , Only that word that is not empty

word_set.add( word )
  
return word_set


def compare_files( file1, file2 ):

data1 = f1.read()
   data2 = f2.read()
  
   wordset1 = build_word_set(data1)
   wordset2 = build_word_set(data2)

  
   unique_word_count = len(wordset1.union(wordset2))
   unique_word_in_both_count = len(wordset1.intersection(wordset2))
# Build two sets:
# all of the unique words in file1
# all of the unique words in file2

# Display the total number of unique words between the
# two files. If a word appears in both files, it should
# only be counted once.
   print("Total unique words:", unique_word_count)

# Display the number of unique words which appear in both
# files. A word should only be counted if it is present in
# both files.
   print("Unique words that appear in both files:", unique_word_in_both_count)
  

######################################################################

f1 = open( "document1.txt" )
f2 = open( "document2.txt" )

lab10(b).py


import string

def build_word_index( input_file ):

   word_map = {}
   line_no = 0

   for line in input_file:
   # Missing code
       line_no += 1
       word_lst = line.strip().split()
       word_lst = [w.lower().strip(string.punctuation) for w in word_lst]
       for word in word_lst:
           if word != "":
               word_map[line_no]=word

   return word_map

def print_word_index( word_map ):
  
index_lst = sorted(list(word_map.items()))

for word, line_set in index_lst:
line_lst = sorted(list(line_set))
line_str = ",".join([str(i) for i in line_lst])
for line_no in line_lst[1:]:
line_str += ", {}".format( line_no )
print("{:14s}:".format(line_str), word )
  
## print("{:14s}:".format(word), line_str )

## Alternative way to create the line_str
## line_str = ",".join([str(i) for i in line_lst])

def main():
  
filename = input( "Name of file to be processed: " )

try:
file = open( filename, "r" )

index = build_word_index( file )

print_word_index( index )

file.close()

except IOError:

print( "Halting -- unable to open", filename )

main()

PLEASE RATE !!!

THANK YOU!!

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote