Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

This needs to be done in Python. I have no idea where to begin, as I have never

ID: 3793045 • Letter: T

Question

This needs to be done in Python. I have no idea where to begin, as I have never used Python before. Can someone give me an idea of where to start or how I should begin the code for this problem? Thanks

Consider the following hash function. Let U be the universe of strings composed of the characters from the alphabet Y DA, ...,z], and let the function f(z) return the index of a letter zi E y, e.g., A 1 and f(z) 26. Finally, for an ma-character string z E Sm, define h (r) 1 f(zi)] mod e), where l is the number of buckets in the hash table. That is, our hash function sums up the index values of the characters of a string and maps that value onto one of the l buckets. (a) The following list contains US Census derived last names: http:// census.gov/topics/genealogy/ 1990surnames/dist.all.last Using these names as input strings, first choose a uniformly random 50% of these name strings and then hash them using h (z). Produce a histogram showing the corresponding distribution of hash locations when l 200. Label the axes of your figure. Brief description what the figure shows about h(a); justify your results in terms of the behavior of h(a). Do not forget to append your code Hint: the raw file includes information other than the name strings, which will need to be removed; and, think about how you can count hash locations without building or using a real hash table.

Explanation / Answer

import matplotlib.pyplot as plt
import numpy as np
import random

def get_hash_index(last_name, bucket):
hash = 0
for c in last_name:
hash += ord(c) - ord('A') + 1
hash = hash % bucket
return hash

last_name_hash = {}

bucket_size = 200

for i in xrange(0, bucket_size):
last_name_hash[i] = 0

names_list = []
with open("dist.all.last.txt") as fp:
for line in fp:
last_name = line.split()[0]
names_list.append(last_name)

k = len(names_list)/2
indicies = random.sample(xrange(len(names_list)), k)
last_names_list = [names_list[i] for i in indicies]

for name in last_names_list:
hash_value = get_hash_index(name,bucket_size)
last_name_hash[hash_value] += 1

print last_name_hash

hash_as_list = []

for key in sorted(last_name_hash.iterkeys()):
hash_as_list.append(last_name_hash[key])

print hash_as_list

b = range(0,bucket_size)
plt.bar(b, hash_as_list)
#plt.hist(hash_as_list, bins=b)
plt.title("Histogram showing distribution of hash of last names")
plt.xlabel("Bucket number")
plt.ylabel("Size of bucket")
plt.show()

# in case indentation is messed up. Access http://pastebin.com/hKd5jSDf

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote