document1.txt:www.cse.msu.edu/~cse231/Labs/Lab09/document1.txt document2.txt:www

ID: 3849006 • Letter: D

Question

document1.txt:www.cse.msu.edu/~cse231/Labs/Lab09/document1.txt

document2.txt:www.cse.msu.edu/~cse231/Labs/Lab09/document2.txt

Assignment overview This lab exercise provides practice with dictionaries in Python. A. Modify a program that uses Dictionaries Consider the file named "lab08a.py". That file contains the skeleton of a Python program to do a simple analysis of a text file: it will display the number of unique words which appear in the file, along with the number of times each word appears. Case does not matter: the words "pumpkin", "Pumpkin" and "PUMPKIN" should be treated as the same word. (The word "map" is used in identifiers because sometimes a dictionary is called a "map.) Execute the program (which currently uses "documenti.txt" as the data file) and inspect the output. a. Replace each of the lines labeled "YOUR COMMENT' with meaningful comments to describe the work being done in the next block of statements. Use more than one comment line, if necessary b. Add doc strings to each function to describe the work being done in the function c. The program currently processes the empty string as a word. Revise the program to exclude empty strings from the collection of words. d. The program currently processes words such as "The" and "the" as different words. Revise the program to ignore case when processing words e. The program currently always uses "document 1.txt" as the input file. Revise the program to prompt the user for the name of the input file f. Revise the program to display the collection of words sorted by greatest frequency of occurrence to least frequency, and sorted alphabetically for words with the same frequency count. Since the sorted function and the sort method are stable sorts, you can first sort the words alphabetically, and then sort them by frequency (with reverse True (You do the two sorts in that order because you do the primary key last, frequency is the primary key in this case.) By default sorting is done on the first item in a list or tuple. To sort on other items use itemgetter from the operator module. See documentation here, focus on the students tuple example https //docs thon.org/3/howto/sorting.html g. Test the revised program. There are two sample documents available: "document l.txt" (The Declaration of Independence) and "document2.txt" The Gettysburg Address)

Explanation / Answer

I am giving solution for first task. Let me know if you need any help in that.

lab09part1.py
---------------------------------------
import string

def get_words(f_obj, my_dict):
    for line in f_obj:
        line = line.strip()
        word_list = line.split()
        for word in word_list:
            word = word.lower()
            word = word.strip(string.punctuation)
            if word:
                if word in my_dict:
                    my_dict[word] += 1
                else:
                    my_dict[word] = 1

def print_alphabetic(my_dict):
    # pairs_list = [(key,value) for key,value in my_dict.items()]
    pairs_list = []
    for key, value in my_dict.items():
        pairs_list.append((key, value))
    # print pairs
    print('+' * 12)
    print('Words in alphabetical order as word:count pairs')
    pairs_list.sort()
    print_cols = 0
    for word, cnt in pairs_list:
        print('%13s:%3d' % (word, cnt), end=' ')
        if print_cols == 3:
            print()
            print_cols = 0
        else:
            print_cols += 1

def print_frequency(my_dict):
    # pairs_list = [(value,key) for key,value in my_dict.items()]
    pairs_list = []
    for key, value in my_dict.items():
        pairs_list.append((value, key))

    print()
    print('+' * 12)
    print('Words in frequency order as count:word pairs')
    pairs_list.sort(reverse=True)
    print_cols = 0
    for cnt, word in pairs_list:
        print('%3d:%-13s' % (cnt, word), end=' ')
        if print_cols == 3:
            print()
            print_cols = 0
        else:
            print_cols += 1

def main():
    file_str = input('What file:')
    file_obj = open(file_str)
    my_dict = {}
    get_words(file_obj, my_dict)
    file_obj.close()
    print('There were %d words in the file %s' % (len(my_dict), file_str))
    print_alphabetic(my_dict)
    print_frequency(my_dict)

return my_dict
-------------------------------------------------------
labPart2.py
-----------------------

import string

def print_index(f_obj,main_words):
    my_dict = {}
    count = 1
    set1 = set()
    set2 = set()

    for line in main_words:
        line = line.strip()
        word_list = line.split()
        for word in word_list:
            word = word.lower()
            word = word.strip(string.punctuation)
            if word != '':
                set1.add(word)

    for line in f_obj:
        line = line.strip()
        word_list = line.split()
        for word in word_list:
            word = word.lower()
            word = word.strip(string.punctuation)
            if word:
                if word in my_dict:
                    b_set = my_dict[word]
                    b_set.add(count)
                else:
                    if word in set1:
                        a_set = set()
                        a_set.add(count)
                        my_dict[word]=a_set

count += 1

return my_dict

def pretty_print_index(my_dict):
pairs_list = []
line_set = []

    for key,value in my_dict.items():
        value = [str(l) for l in value]
        value.sort()
        ','.join(value)
        pairs_list.append((key, value))
    print('Words in alphabetical order as word:count pairs')
    pairs_list.sort()
    print_cols = 0

    for word,cnt in pairs_list:
        print("{:12s}".format(word),":",','.join(cnt), end=' ')
        if print_cols == 0:
            print()
            print_cols = 0
        else:
            print_cols += 1

##def compare_files(f_obj, g_obj):
##    #count = 0
##    #count2 = 0
##    set1 = set()
##    set2 = set()
##    for line in f_obj:
##        line = line.strip()
##        word_list = line.split()
##        for word in word_list:
##            word = word.lower()
##            word = word.strip(string.punctuation)
##            if word != '':
##                set1.add(word)
##        for line in g_obj:
##            line = line.strip()
##            word_list = line.split()
##            for word in word_list:
##                word = word.lower()
##                word = word.strip(string.punctuation)
##                if word != '':
##                    set2.add(word)
##
##    combined_set = set1 & set2
##    union_set = set1 | set2
##
##    print("Common words:",len(combined_set))
##    print("Total words:",len(union_set))

def main():
    file_str = input('What file:')
    f_obj = open(file_str)
    #file2_str = input('What file:')
    #g_obj = open(file2_str)
    file3_str = input('What main words:')
    h_obj = open(file3_str)
    my_dict = print_index(f_obj,h_obj)
    pretty_print_index(my_dict)
    #compare_files(f_obj, g_obj)

    f_obj.close()
    #print('There were %d words in the file %s'%(len(my_dict),file_str))
    #print_alphabetic(my_dict)
    #print_frequency(my_dict)
    #return my_dict

main()

---------------------------------------------------------------
WordFrequency.py
------------------------------
import string

def print_frequency(my_dict):
    # pairs_list = [(value,key) for key,value in my_dict.items()]
    pairs_list = []
    for key, value in my_dict.items():
        pairs_list.append((value, key))

return my_dict

Navigate

document1.txt:https://www.cse.msu.edu/~cse231/Labs/Lab10/document1.txt document2

document1.txt:www.cse.msu.edu/~cse231/Labs/Lab09/document1.txt document2.txt:www

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

document1.txt:www.cse.msu.edu/~cse231/Labs/Lab09/document1.txt document2.txt:www

Question

Explanation / Answer

Related Questions

Navigate