Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Dan has been staring at the messages all week, but he has not made a breakthroug

ID: 3692051 • Letter: D

Question

Dan has been staring at the messages all week, but he has not made a breakthrough decrypting the message. You need to make some progress soon, or the jewel thief will escape. "Maybe a greedy method will work," Dan says, "We can just assign the most frequent letter in the encoded message to the most frequently used English letter, and the second most frequent letter in the encoded message with the second most frequently used English letter. I'm sure that'll crack it!"

Score file

Dan offers you a csv file containing letter frequencies called 1-scores.csv. You will need to parse this file. Each line corresponds to a letter in our 28-letter alphabet. There is a comma on each line separating two columns. The first column is a string representing a symbol in our alphabet. The second column is a floating point number representing how common that particular symbol is. The higher the number, the more frequently that letter is used. If you are interested, the floating point number is the log odds ratio computed from dozens of novels downloaded from Project Gutenberg.

Write a function called 'read_scores' that takes a single string argument, a parameter named "input_file". Read in the file with that name and parse it. Return a dictionary that maps the strings in the first column to the floating point values in the second column.

Letter frequencies

Now, you need a function to count the frequency of letters in the encoded messages. Write a function called count_letters that takes a single string parameter named "message". This function should return a dictionary that maps each letter to the number of times it occurs in a string. Note: Initialize the count of all 28 letters in our alphabet to zero.

Decrypting the message

You are ready to try to decrypt the message from last week's assignment. Using the two functions you just wrote, create a dictionary named "translation" that maps the most frequent letters in the substitution alphabet to their corresponding letter in the plaintext alphabet. The most common letter in the substitution alphabet should map to " ". The second most common letter should map to "E". The third most common should map to "T". In the case of a tie (two letters have the same count) you can order those letters any way you choose. Note: We have not described how to do this section detail. That was intentional. This is a problem for you to solve.

Call substitute

You now have a mapping between the substitution alphabet letters and their regular alphabet counterparts. You need to call the "substitution" function you wrote in Part 1 to convert the string back to plain text. Remember, substitute expects a list of characters in the order of our 28-character alphabet, so that is what we need to create. The "translation" dictionary maps letters in the substitution alphabet (the encoded message we have) to the message alphabet (the decoded message we want.)

Call "substitution" to translate the message from the previous homework assignment and store the result in a string called "greedy_output".

Explanation / Answer

from pprint import pprint from copy import copy """ As far as i know the english dictionary has 26 letters, add the remaining 2 in the dictionary below """ alphabet = {'a': 0, 'b': 0, 'c': 0, 'd': 0, 'e': 0, 'f': 0, 'g': 0, 'h': 0, 'i': 0, 'j': 0, 'k': 0, 'l': 0, 'm': 0, 'n': 0, 't': 0, 'u': 0, 'v': 0, 'w': 0, 'x': 0, 'y': 0, 'z': 0} def read_scores(input_file): """ Maps the strings in the first column to the floating point values in the second column. :param input_file: a .csv file of 2 columns -> letter and log odds ratio :return: dictionary. {'letter': log odds ratio, ..} """ __alphabet = copy(alphabet) # to avoid overriding the original alphabet csv_file = open(input_file) # going through the csv_file for line in csv_file: # the split function does the trick to get the comma separated values letter, odd_ratio = line.split(',') __alphabet[letter] = float(odd_ratio) return __alphabet def count_letter(message): __alphabet = copy(alphabet) # to avoid overriding the original alphabet message_file = open(message) for line in message_file: for letter in line: letter = letter.lower() # Beware, the uppercase letters ain't equal to the lowercase ones! # Count up the number of letters frequencies in the document. if letter in alphabet.keys(): __alphabet[letter] += 1 return __alphabet # Call the functions letter_scores = read_scores('1-scores.csv') substitution_alphabet = count_letter('message.txt') # Order the letters in both dictionaries by letter frequency. substitution_alphabet_ordered_keys = sorted(substitution_alphabet.keys(), key=lambda k: substitution_alphabet[k], reverse=True) letter_scores_ordered_keys = sorted(letter_scores.keys(), key=lambda k: letter_scores[k], reverse=True) # Define the translation empty dict to fill it translation = {} # Pair the most frequent letters encrypted document with the most frequent ones at the 1-score.csv for i in range(len(alphabet)): translation[substitution_alphabet_ordered_keys[i]] = letter_scores_ordered_keys[i] # The translation dictionary is the one you will use to decrypt the message! pprint(translation) """ Hope this code does what you need, as you didn't post the original encrypted message nor the 1-scores.csv i can't confirm the result. If you have any doubts, please post it on the comments section and i will be glad to answer it. Good luck! """