You are given the following collection of twelve baskets, each of which containi
ID: 3892613 • Letter: Y
Question
You are given the following collection of twelve baskets, each of which containing three of the six items 1 through 6: {1, 2, 3} {2, 3, 4} {3, 4, 5} {4, 5, 6} {1, 3, 5} {2, 4, 6} {1, 3, 4} {2, 4, 5} {3, 5, 6} {1, 2, 4} {2, 3, 5} {3, 4, 6} The support threshold is 4. On the first pass of the PCY Algorithm we use a hash table with 11 buckets, and the set {i, j} is hashed to bucket i×j mod 11. a) By any method, compute the support for each item and each pair of items. b) Which pairs hash to which buckets? c) Which buckets are frequent? d) Which pairs are counted on the second pass of the PCY Algorithm?
Explanation / Answer
In Park Chen Yu(PCY) algorithm, if we think of passing an array as hash table whose buckets hold integers rather than set of keys( as in ordinary hash table) or bits ( as in a Bloom filter).
Pairs of items are hashed to buckets of this hash table. When we see the basket in 1st pass,we not only add 1 to the count for each item in basket but we generate all pairs using double loop.At the end of the bucket,each pair has a count which is the sum of all pairs that hash to that bucket.
If the count of the bucket is less than the support threshhold, then it is called infrequent bucket. This gives an advantage to 2nd pass.
After the middle pass,fewer pairs contribute to buckets .rest data goes to pass 3. Here it count only those pairs(i,j) that match these conditions:
A) both i and j are frequent items
B) using first hash function,the pair hashes to a bucket whose bit in the first bit vector is 1.
C)using 2nd hash function,the pair hashes to a bucket whose bit in second bit vector is 1.
The two hash functions needs to be independent. Both hashes are checked on third pass.
If not we would end up counting pairs of frequent items that hashed first to an infrequent bucket during pass1 but happened to hash to a frequent bucket during the new pass2.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.