You are interested in analyzing some hard-to-obtain data from two separate datab

ID: 3798550 • Letter: Y

Question

You are interested in analyzing some hard-to-obtain data from two separate databases. Each database contains n numerical values – so there are 2n values total – and you may assume that no two values are the same. You’d like to determine the median of this set of 2n values, which we will define here to be the n th smallest value. However, the only way you can access these values is through queries to the databases. In a single query, you can specify a value k to one of the two databases, and the chosen database will return the k th smallest value it contains. Since queries are expensive, you’d like to compute the median using as few queries as possible. Give an algorithm that finds the median value using at most O(log n) queries.

Give a sketch of the algorithm with an explanation and show the running time complexity O(log n) by building a recurrence relation. We can consider the two data sets (or databases) virtually sorted through queries for finding the kth smallest data item, so let’s denote the kth smallest data item of each database as A[k] and B[k] where A and B denote the two databases of size n each. Hint: the size of each data set (or database) can be reduced to half at each recursion.

Explanation / Answer

Let DB1,DB2 be the two databases. Our algorithm is as follows:

Algorithm 1 Median finding algorithm for joint databases
1: p1 = p2 = n/2 (//two query pointers)
2: for i = 2 to log n do
3: m1 = QueryDB(DB1, p1) // get the median of DB1
4: m2 = QueryDB(DB2, p2) // get the median of DB2
5: if m1 > m2 then
6: p1 = p1 n/2i // next time, query the median of the upper half of DB1
7: p2 = p2 + n/2i // next time, query the median of the lower half of DB2
8: else
9: p1 = p1 + n/2i
10: p2 = p2 n/2i
11: end if
12: end for
13: return min (m1,m2)

In the above algorithm, p1 and p2 are two query pointers for both databases. We first query
the medians of both databases to obtain m1,m2. We show the median of the joint database must
be in between m1 and m2. To see this, observe that there are at least n records in DB1 and DB2
which are smaller than or equal to max(m1,m2). Hence, the median of the joint database is not
greater than max(m1,m2). Similarly we can show the median of the joint database is not smaller
than min(m1,m2). Then we can move the pointers p1 and p2 accordingly. (To visualize, in Figure
3, the shaded parts of both databases can actually be discarded.) By the end of the loop, m1,m2
are the nth and the (n + 1)th smallest numbers of the joint database, hence we return the smaller
one among m1,m2.
Let T (n) be the total number of queries. As each round we reduce the problem size by half using
two queries, we have T (n) = T (n/2) + 2. Solving this recurrence we obtain T (n) = O(log n).

Navigate

You are interested in analyzing some hard-to-obtain data from two separate datab

You are interested in analyzing the frequency of numbers selected for the Powerb

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

You are interested in analyzing some hard-to-obtain data from two separate datab

Question

Explanation / Answer

Related Questions

Navigate