Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Given a data set with five transactions, each containing five items, as shown in

ID: 3890433 • Letter: G

Question

Given a data set with five transactions, each containing five items, as shown in the table.



(a) What is the maximum number of possible frequent itemsets?

b) Let min_support = 50%. Find all frequent itemsets using the Apriori algorithm. Your answer should include the key steps of the computation process.

(c) In the computation (b) above, how many rounds of database scan are needed? What is the total number of candidates?

(d) Let n be the total number of transactions, b be the number of items in each transaction, m be the number of k-itemset candidates. Consider the following two different approaches for counting the support values of the candidates. For each transaction, the first approach checks if a candidate occurred in the transaction or not; the second approach enumerates all the possible k-itemsets of the transaction and checks if the itemset is one of the candidates. What is the computation complexity for each approach? Is one always better than the other?

TID items_bought T1 {A, H, K, T, X} T2 {A, H, X, T, Z} T3 {A, B, D, R, S} T4 {B, H, S, T, X} T5 {B, H, G, M, S}

Explanation / Answer

a)

b)

The algorithm repeats the above step until, there is no possibility to generate candidates set.

Apply Apriori algorithm:

Round 1:

C1

Itemset

Sup.count

{A}

3

{B}

3

{D}

1

{G}

1

{H}

4

{K}

1

{M}

1

{R}

1

{S}

3

{T}

3

{X}

3

{Z}

1

L1

Itemset

Sup.count

{A}

3

{B}

3

{H}

4

{S}

3

{T}

3

{X}

3

Round 2:

C2

Itemset

Sup.count

{A,B}

1

{A,H}

2

{A,S}

1

{A,T}

2

{A,X}

2

{B,H}

2

{B,S}

3

{B,T}

1

{B,X}

1

{H,S}

2

{H,T}

3

{H,X}

3

{S,T}

1

{S,X}

1

{T,X}

3

L2

Itemset

Sup.count

{B,S}

3

{H,T}

3

{H,X}

3

{T,X}

3

Round 2:

C3

Itemset

Sup.count

{B,S,H}

2

{B,S,T}

1

{B,S,X}

1

{H,T,B}

1

{H,T,S}

1

{H,T,X}

3

{H,X,B}

1

{H,X,S}

1

{T,X,B}

1

{T,X,S}

1

L3

Itemset

Sup.count

{H,T,X}

3

Therefore, all frequent itemsets={{A},{B},{H},{S},{T},{X},{B,S},{H,T},{H,X},{T,X} ,{H,T,X}}

3 rounds of data base scan are required to find all frequent itemsets and the total candidates are 3.

d)

Thus, this approach takes O(nmb).

Thus this approach takes O(nm2b).

Therefore, obviously the first approach is always better than the second approach.

Itemset

Sup.count

{A}

3

{B}

3

{D}

1

{G}

1

{H}

4

{K}

1

{M}

1

{R}

1

{S}

3

{T}

3

{X}

3

{Z}

1

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote