Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Join size estimate and primary key. (10 points) Consider three relations rl, r2

ID: 3879938 • Letter: J

Question

Join size estimate and primary key. (10 points) Consider three relations rl, r2 and r3, with schemas R1(ABC), R2CDE) and R3(EF), respectively (underlined attributes indicate primary keys). Assume that rl has 1000 tuples, r2 has 600 tuples, and r3 has 1,200 tuples. (i Estimate the size of natural join of three relations rl x r2 x r3. Explain how to get your answer ii Now suppose in R1, C (instead of A) is the primary key (i.e., we have RI(ABC)). Estimate the size of natural join of three relations rl x r2 x r3 Explain how to get your answer.

Explanation / Answer

(i) Size of natural joins

Natural join is a type of equi join where identical columns appear only once

The natural join r1 |x| r2 |x| r3 can be split into (r1 natural join r2) followed by natural join r3

r1 |x| r2:

Now coming to r1 |x| r2, the size of natural join can be maximum 1000 tuples.

Reason: Here C is the primary key for r2, so if all the tuples in r1 contain a possible value for C, there can be 1000 rows. If none of the rows contain a possible value for C, there will be 0 rows. So, the range for r1 |x| r2 lies between 0 and 1000.

(r1 |x| r2) |x| r3

Now, when the above result is naturally joined with r3, i.e., (r1 |x| r2) |x| r3, the size of natural join can be maximum 1000 tuples.

Reason: Now, E is the primary key for r3. So, assuming there are 1000 rows in r1 |x| r2 and if all the tuples from (r1 |x| r2) contain a possible value for column E, there can be 1000 rows. On the other hand, if none of the rows in r1 |x| r2 contain a possible value for column E, there will be 0 rows.So, the range for r1 |x| r2 |x| r3 can lie anywhere between 0 and 1000 and can be maximum 1000.

(ii) C is the primary key of R1.

r1 |x| r2:

Now coming to r1 |x| r2, the size of natural join will be maximum 600 tuples.

Reason: Here C is the primary key for both r1 and r2, so all the tuples in r2 contain a possible value for C which is also present in r1, there will be maximum 600 rows.

(r1 |x| r2) |x| r3

Now, when the above result is naturally joined with r3, i.e., (r1 |x| r2) |x| r3, the size of natural join can be maximum 600 tuples.

Reason: Now, E is the primary key for r3. So, there are maximum 600 rows in r1 |x| r2 and if all the tuples from (r1 |x| r2) contain a possible value for column E, there can be 600 rows. On the other hand, if none of the rows in r1 |x| r2 contain a possible value for column E, there will be 0 rows.So, the range for r1 |x| r2 |x| r3 can lie anywhere between 0 and 600 and can be maximum 600