Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider the following hash function. Let U be the universe of strings composed

ID: 3207791 • Letter: C

Question

Consider the following hash function. Let U be the universe of strings composed of the characters from the alphabet sigma = [A, ..., Z], and let the function f(x_i) return the index of a letter x_i sigma, e.g., f(A) = 1 and f(Z) = 26. Finally, for an m-character string x sigma^m, define h(x) = ([sigma_i = 1^m f(x_i)] mod l), where l is the number of buckets in the hash table. That is, our hash function sums up the index values of the characters of a string x and maps that value onto one of the l buckets. The following list contains US Census derived last names: Using these names as input strings, first choose a uniformly random 50% of these name strings and then hash them using h(x). Produce a histogram showing the corresponding distribution of hash locations when l = 200. Label the axes of your figure. Brief description what the figure shows about h(x); justify your results in terms of the behavior of h(x). Do not forget to append your code.

Explanation / Answer

import matplotlib.pyplot as plt
import numpy as np
import random

def get_hash_index(last_name, bucket):
hash = 0
for c in last_name:
hash += ord(c) - ord('A') + 1
hash = hash % bucket
return hash

last_name_hash = {}

bucket_size = 200

for i in xrange(0, bucket_size):
last_name_hash[i] = 0

names_list = []
with open("dist.all.last.txt") as fp:
for line in fp:
last_name = line.split()[0]
names_list.append(last_name)

k = len(names_list)/2
indicies = random.sample(xrange(len(names_list)), k)
last_names_list = [names_list[i] for i in indicies]

for name in last_names_list:
hash_value = get_hash_index(name,bucket_size)
last_name_hash[hash_value] += 1

print last_name_hash

hash_as_list = []

for key in sorted(last_name_hash.iterkeys()):
hash_as_list.append(last_name_hash[key])

print hash_as_list

b = range(0,bucket_size)
plt.bar(b, hash_as_list)
#plt.hist(hash_as_list, bins=b)
plt.title("Histogram showing distribution of hash of last names")
plt.xlabel("Bucket number")
plt.ylabel("Size of bucket")
plt.show()

# in case indentation is messed up. Access http://pastebin.com/hKd5jSDf

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote