Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Q3. You are in charge of the Genghis (\'We execute fast\') search engine. You ar

ID: 3719747 • Letter: Q

Question

Q3. You are in charge of the Genghis ('We execute fast') search engine. You are designing your server cluster to handle 500 Million hits a day and 10 billion pages of indexed data. Each machine costs $1000, and can store 10 million pages and respond to 200 queries per second (against these pages).

1. If you were given a budget of $500,000 dollars for purchasing machines, and were required

to index all 10 billion pages, could you do it?

$500,000 is equivalent to 500000/1000 = 500 machines. This is equivalent to 500 * 10 Million pages = 5 * 109 pages. Which is equivalent to 5 billion pages. However the requirement is for 10 billion pages and thus we cannot do it.

2. What is the minimum budget to index all pages? If you assume that each query can

be answered by looking at data in just one (10 million page) partition, and that queries

are uniformly distributed across partitions, what peak load (in number of queries per

second) can such a cluster handle?

Minimum cost would be equivalent to $1,000,000 = 1 Million Dollars. This would allow us to have enough machines to index around 10 billion pages in 10 million page partitions on 1000 machines. The server would be able to handle around 200 * 1000 = 200,000 queries per second.

3. How would your answer to the previous question change if each query, on average, accessed

two partitions?

Then the answer would be around 100,000 queries per second. Because two machines together are expected to respond to 200 queried per second.

4. What is the running budget required to handle the desired load of 500 million hits per

day if all queries are on a single partition? Assume that queries are uniformly distributed

with respect to time of day.

5. Would your answer to the previous question change if the number of queries per day

went up to 5 billion hits per day? How would it change if the number of pages went up

to 100 billion'?

6. Assume that each query accesses just one partition, that queries are uniformly distributed

across partitions, but that at any given time, the peak load on a partition is upto 10

times the average load. What is the minimum budget for purchasing machines in this

scenario?

7. Take the cost for machines [take the previous question and multiply it by 10 to reflect

the costs of maintenance, administration, network bandwidth, etc. This amount is your

annual cost of operation. Assume that you charge advertisers 2 cents per page. What

fraction of your inventory (i.e., the total number of pages that you serve over the course

of a year) do you have to sell in order to make a profit?

Explanation / Answer

4) Total hits: 500,000,000, Each machine capacity= 200 queries/second ,

Each machine in a day= Total seconds in a day * Each machine capacity=86400*200=17,280,000
Total machine required= Total hits/ Each machine capacity in a day
=500,000,000 / 17,280,000
=28.94 or 30 machines
Each machine cost=$1000
Total cost= 30*1000=$30,000

5) If the number of queries increase to 5 billion then the cost will become $300,000 as 300 machines will be required.
If the number of pages become 100 billion then cost will become, Total pages/ Each machine's page handling capacity..
=100 billion/10 million pages= 10,000 machines
= Each machine cost $1000* Total machines
=$10,000,000

6) 10 times the average load means 500 million hits* 10 = 5,000,000,000
Hits per second per machine=200
Total machines= 5,000,000,000/200*86400
=290*Cost per machine($1000)
=$290,000