Q3. You are in charge of the Genghis (\'We execute fast\') search engine. You ar
ID: 3719747 • Letter: Q
Question
Q3. You are in charge of the Genghis ('We execute fast') search engine. You are designing your server cluster to handle 500 Million hits a day and 10 billion pages of indexed data. Each machine costs $1000, and can store 10 million pages and respond to 200 queries per second (against these pages).
1. If you were given a budget of $500,000 dollars for purchasing machines, and were required
to index all 10 billion pages, could you do it?
$500,000 is equivalent to 500000/1000 = 500 machines. This is equivalent to 500 * 10 Million pages = 5 * 109 pages. Which is equivalent to 5 billion pages. However the requirement is for 10 billion pages and thus we cannot do it.
2. What is the minimum budget to index all pages? If you assume that each query can
be answered by looking at data in just one (10 million page) partition, and that queries
are uniformly distributed across partitions, what peak load (in number of queries per
second) can such a cluster handle?
Minimum cost would be equivalent to $1,000,000 = 1 Million Dollars. This would allow us to have enough machines to index around 10 billion pages in 10 million page partitions on 1000 machines. The server would be able to handle around 200 * 1000 = 200,000 queries per second.
3. How would your answer to the previous question change if each query, on average, accessed
two partitions?
Then the answer would be around 100,000 queries per second. Because two machines together are expected to respond to 200 queried per second.
4. What is the running budget required to handle the desired load of 500 million hits per
day if all queries are on a single partition? Assume that queries are uniformly distributed
with respect to time of day.
5. Would your answer to the previous question change if the number of queries per day
went up to 5 billion hits per day? How would it change if the number of pages went up
to 100 billion'?
6. Assume that each query accesses just one partition, that queries are uniformly distributed
across partitions, but that at any given time, the peak load on a partition is upto 10
times the average load. What is the minimum budget for purchasing machines in this
scenario?
7. Take the cost for machines [take the previous question and multiply it by 10 to reflect
the costs of maintenance, administration, network bandwidth, etc. This amount is your
annual cost of operation. Assume that you charge advertisers 2 cents per page. What
fraction of your inventory (i.e., the total number of pages that you serve over the course
of a year) do you have to sell in order to make a profit?
Explanation / Answer
4) Total hits: 500,000,000, Each machine capacity= 200 queries/second ,
Each machine in a day= Total seconds in a day * Each machine capacity=86400*200=17,280,000
Total machine required= Total hits/ Each machine capacity in a day
=500,000,000 / 17,280,000
=28.94 or 30 machines
Each machine cost=$1000
Total cost= 30*1000=$30,000
5) If the number of queries increase to 5 billion then the cost will become $300,000 as 300 machines will be required.
If the number of pages become 100 billion then cost will become, Total pages/ Each machine's page handling capacity..
=100 billion/10 million pages= 10,000 machines
= Each machine cost $1000* Total machines
=$10,000,000
6) 10 times the average load means 500 million hits* 10 = 5,000,000,000
Hits per second per machine=200
Total machines= 5,000,000,000/200*86400
=290*Cost per machine($1000)
=$290,000
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.