Given a data set with five transactions, each containing five items, as shown in

ID: 3890433 • Letter: G

Question

Given a data set with five transactions, each containing five items, as shown in the table.

(a) What is the maximum number of possible frequent itemsets?

b) Let min_support = 50%. Find all frequent itemsets using the Apriori algorithm. Your answer should include the key steps of the computation process.

(d) Let n be the total number of transactions, b be the number of items in each transaction, m be the number of k-itemset candidates. Consider the following two different approaches for counting the support values of the candidates. For each transaction, the first approach checks if a candidate occurred in the transaction or not; the second approach enumerates all the possible k-itemsets of the transaction and checks if the itemset is one of the candidates. What is the computation complexity for each approach? Is one always better than the other?

TID items_bought T1 {A, H, K, T, X} T2 {A, H, X, T, Z} T3 {A, B, D, R, S} T4 {B, H, S, T, X} T5 {B, H, G, M, S}

Explanation / Answer

The algorithm repeats the above step until, there is no possibility to generate candidates set.

Apply Apriori algorithm:

Round 1:

Itemset

Sup.count

{A}

{B}

{D}

{G}

{H}

{K}

{M}

{R}

{S}

{T}

{X}

{Z}

Itemset

Sup.count

{A}

{B}

{H}

{S}

{T}

{X}

Round 2:

Itemset

Sup.count

{A,B}

{A,H}

{A,S}

{A,T}

{A,X}

{B,H}

{B,S}

{B,T}

{B,X}

{H,S}

{H,T}

{H,X}

{S,T}

{S,X}

{T,X}

Itemset

Sup.count

{B,S}

{H,T}

{H,X}

{T,X}

Round 2:

Itemset

Sup.count

{B,S,H}

{B,S,T}

{B,S,X}

{H,T,B}

{H,T,S}

{H,T,X}

{H,X,B}

{H,X,S}

{T,X,B}

{T,X,S}

Itemset

Sup.count

{H,T,X}

Therefore, all frequent itemsets={{A},{B},{H},{S},{T},{X},{B,S},{H,T},{H,X},{T,X} ,{H,T,X}}

3 rounds of data base scan are required to find all frequent itemsets and the total candidates are 3.

Thus, this approach takes O(nmb).

Thus this approach takes O(nm2b).

Therefore, obviously the first approach is always better than the second approach.

Itemset

Sup.count

{A}

{B}

{D}

{G}

{H}

{K}

{M}

{R}

{S}

{T}

{X}

{Z}

Navigate

Given a data set consisting of 1000 values that are alldifferent is: A) the 29th

Given a data set, how do you know whether to calculate sigma or s? Choose the co

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Given a data set with five transactions, each containing five items, as shown in

Question

Explanation / Answer

Related Questions

Navigate