Given P={p_1, p_2, ..., p_m}, the set of products available at a local grocery s
ID: 642968 • Letter: G
Question
Given
P={p_1, p_2, ..., p_m}, the set of products available at a local grocery store;
D={T_1, T_2, ..., T_N}, the set of transactions that were completed at this store in the past year, where each T_i (i in [1,N]) is a subset of P;
min_freq, a user-specified parameter (e.g., 5%).
A product is said to be frequent if it occurs in at least min_freq of the transactions in D. By the same token, a pair of products is said to be frequent if it occurs in at least min_freq of the transactions in D. Also, it's straightforward to prove the following claim: if a product is not frequent, none of the pairs that involve this product will be frequent. Use this claim to design an algorithm to find all the frequent products and frequent product pairs in D. Represent your algorithm in pseudo code.
Explanation / Answer
Step1: Count the number of items present in each transaction and put in an array a[ ].
Step2: Find the transactions having maximum items (max) in the array a[ ].
Step3: If Count (max (a[ ]) ) ? min_val then transfer those transactions to an another array arr[ ][ ],else find
subsets.
Step4: Compare each transaction in arr[ ][ ] with other transactions.
Step5: Take a Counter C and increase the counter if we found similar itemsets in arr[ ][ ].
Step6: If {C ? min_val} then itemset will be the most frequent itemset.
Step7: if {C<min_val} then find the subsets of all transactions and store it in an array sub[ ][ ].
Step8: max = max-1.
Step9: add the transactions of sub[ ][ ] to arr[ ][ ].
Step10: Repeat from step3 until frequent itemset is found
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.