1. Consider the following hash function. int hashFunc(int x) { return (x * 2) %
ID: 3868110 • Letter: 1
Question
1. Consider the following hash function. int hashFunc(int x) { return (x * 2) % HASH_SIZE; } Assume HASH_SIZE = 10. Here is the hash table’s insert function: void insert(int key) { int index = hashFunc(key); hash_array[index].push_back(key); } where hash_array is an array of list’s.
2. Draw the state of hash_array on the right side after the following calls. Assume this is a chaining hash table (each element in the array is a linked list of records). The first two elements are drawn in there for you.
insert(7); insert(1); insert(23); insert(14); insert(19); insert(53); insert(37); insert(83); Is hashFunc() a good hash function? Why or why not?
3. You are hired to design a website called brutionary.com, a dictionary.com variant. You are given a list of dictionary words (and their definitions) in a text file, and would like to preprocess it, such that you can readily provide information to users who visit your website and look up words. Assume that your dictionary is not going to be updated once it is preprocessed. We still want your system to “scale” -- that is, it should be able to efficiently take care of a lot of queries that may come in. Assume a Word structure like the following is used to store each word and definition. struct Word { string word; string definition; }; (a) First of all, the users should be able to look up words. Which option would you take, and why? [Option A] Store Words in a binary search tree. [Option B] Store Words in a hash table. (b) You want to add a functionality that prints all words within a specified range (e.g., all words between abstain and abstract). Which option would you take, and why? [Option A] Add a sorted linked list with pointers to Words in the structure used in (a). Each time a range [x:y] is specified, we search x in the list, and then traverse the list to print each word, until we hit y. [Option B] Add a sorted vector with pointers to Words in the structure used in (a). Each time a range [x:y] is specified, we search x in the vector, and then traverse the vector to print each word, until we hit y. (c) In the right corner of the website, you want to display “k most popular words”, where k is some integer, which are determined by queries that were received in the past hour, and is updated every hour. Assume queries are made on n distinct words, where n >> k. Which option would you take, and why? [Option A] In the beginning of every hour, create a hash table (initially empty) that stores (word, count)-pairs. Each time a query for a word x comes in, look up x in the hash table (or add a new one if one does not exist), and increase the count for x. At the end of the hour, iterate through all (word, count)-pairs in the hash table and store them in a maxheap, using their counts as the keys. Then extract k words from the heap. [Option B] In the beginning of every hour, create a vector (initially empty) that stores (word, count)- pairs. Each time a query for a word x comes in, look up x in the vector (or add a new one if one does not exist), and increase the count for x. At the end of the hour, sort the pairs in the vector in the decreasing order of their counts, and print the first k words. Note: There may, of course, be a better design choice to each problem than ones presented above. Feel free to discuss other possibilities with your peers.
Explanation / Answer
0
1
2 -->(1)
3
4 -->(7)-->(37)
5
6 -->(23)-->(53)-->(83)
7
8 -->(14)-->(19)
9
The hash functon is not efficent because it can only use half of the hash table with even place holders beacuse x*2 is an even number and even mod 10 will extract the last digit only which is always even and can never be odd
3
a)I would choose [Option B] Store Words in a hash table, because this will allow me to search in O(1) time period (average case time complexity of hash search is O(1))
b)[Option B] Add a sorted vector with pointers to Words in the structure used in (a) because Vectors should do well in terms of caching (less cache miss) since vectors are stored in non-contiguous memory locations
c) In the beginning of every hour, create a hash table (initially empty) that stores (word, count)-pairs. Each time a query for a word x comes in, look up x in the hash table (or add a new one if one does not exist), and increase the count for x. At the end of the hour, iterate through all (word, count)-pairs in the hash table and store them in a maxheap, using their counts as the keys. Then extract k words from the heap, because it will optimize the count updation problem since searcing words using hash is much faster and heap sort is not a bad sorting algorithm in terms of performance.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.