Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Your team in charge of database administration was discussing different alternat

ID: 3847033 • Letter: Y

Question

Your team in charge of database administration was discussing different alternatives for indexing your organization’s databases. Some tables in one database have very few insertions but they are used intensively by different services to check for information about items using the item_ID number. While many of your colleagues proposed using a tree index, you argued for a Hash index for these tables because it provides an average-case search cost of only slightly more than one disk I/O. The team leader agrees to adopt your solution but has asked you to write a short explanation for two questions:

a.How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure? (6 marks)

b.If a Linear Hashing index using Alternative (1) for data entries contains 10,000 records, with 10 records per page and an average storage utilization of 80 percent, what is the worst-case cost for an equality search? Under what conditions would this cost be the actual search cost? (6 marks)

Explanation / Answer

a).If we begin with an index which has B buckets, during the round all the buckets will be split in order, in steady progression.A hash function is required to convey the search key values consistently in all the buckets.This sort of split during the round causes a redistribution of key values in all the buckets.If a bucket contains overflow pages, after the redistribution it is most probable that the length of the overflow chain diminishes. If the hash function is good, the length of the overflow chains in most buckets is zero in light of the fact that in each round there will be no less than one redistribution of the values in each bucket. The count of overflow pages during the round is not supposed to go beyond one because the hash function distributes the incoming entries consistently.

b). worst-case cost for equality search:

N /0.8P

which makes it equal to 10000/(0.8*10)=1250.

This can be done when all the keys map into the same bucket. (The effect of 80% occupancy is to increase the number of pages in the file, relative to a file with 100% occupancy.)