Consider the following hash function. Let U be the universe of strings composed
ID: 3207791 • Letter: C
Question
Consider the following hash function. Let U be the universe of strings composed of the characters from the alphabet sigma = [A, ..., Z], and let the function f(x_i) return the index of a letter x_i sigma, e.g., f(A) = 1 and f(Z) = 26. Finally, for an m-character string x sigma^m, define h(x) = ([sigma_i = 1^m f(x_i)] mod l), where l is the number of buckets in the hash table. That is, our hash function sums up the index values of the characters of a string x and maps that value onto one of the l buckets. The following list contains US Census derived last names: Using these names as input strings, first choose a uniformly random 50% of these name strings and then hash them using h(x). Produce a histogram showing the corresponding distribution of hash locations when l = 200. Label the axes of your figure. Brief description what the figure shows about h(x); justify your results in terms of the behavior of h(x). Do not forget to append your code.Explanation / Answer
import matplotlib.pyplot as plt
import numpy as np
import random
def get_hash_index(last_name, bucket):
hash = 0
for c in last_name:
hash += ord(c) - ord('A') + 1
hash = hash % bucket
return hash
last_name_hash = {}
bucket_size = 200
for i in xrange(0, bucket_size):
last_name_hash[i] = 0
names_list = []
with open("dist.all.last.txt") as fp:
for line in fp:
last_name = line.split()[0]
names_list.append(last_name)
k = len(names_list)/2
indicies = random.sample(xrange(len(names_list)), k)
last_names_list = [names_list[i] for i in indicies]
for name in last_names_list:
hash_value = get_hash_index(name,bucket_size)
last_name_hash[hash_value] += 1
print last_name_hash
hash_as_list = []
for key in sorted(last_name_hash.iterkeys()):
hash_as_list.append(last_name_hash[key])
print hash_as_list
b = range(0,bucket_size)
plt.bar(b, hash_as_list)
#plt.hist(hash_as_list, bins=b)
plt.title("Histogram showing distribution of hash of last names")
plt.xlabel("Bucket number")
plt.ylabel("Size of bucket")
plt.show()
# in case indentation is messed up. Access http://pastebin.com/hKd5jSDf
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.