Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Python Question!!! The aim of this exercise is to find those pairs of products t

ID: 671643 • Letter: P

Question

Python Question!!!

The aim of this exercise is to find those pairs of products that appear most frequently in customers' bills.

You are given a list of shopping bills, in the same format as Question 1. You can assume that all inputs are of the correct type.

Your goal is to write a function that finds the product pairs that occur most frequently in the given bills. Thus, your function would return [['eggs', 'milk']] given the list of bills [['milk','eggs'],['milk','milk','tofu', 'eggs'],['eggs']]. In this case, the pair 'eggs' and 'milk' appeared in two bills. Note that the items in each bill should be treated as unique when creating pairs. For example, in the second bill in the example above, the possible pairs are ['milk', 'tofu'], ['eggs', 'milk']and ['eggs', 'tofu']. You should not include ['milk', 'milk'].

You should write a function mostFrequentPairs(bills) that takes a list of shopping bills bills and returns the list of lists of products that appear most frequently together. If there are two or more pairs of products that have the highest frequency, then the return list containing those pairs of products. Each pair of products should be in sorted order, and the list of lists should also be in sorted order. If you are given an empty list of bills, you should return the empty list. Note that you should not use the itertools.combinations function in the implementation of your function.

Example calls to the function are:

>>> print(mostFrequentPairs([['milk','eggs'],['milk','milk','tofu', 'eggs'],['eggs']]))
[['eggs', 'milk']]
>>> print(mostFrequentPairs([['milk','eggs'],['milk','milk','tofu', 'eggs'],['tofu','eggs']]))
[['eggs', 'milk'], ['eggs', 'tofu']]
>>> print(mostFrequentPairs([['milk','eggs'],['milk','milk','tofu', 'eggs'],[]]))
[['eggs', 'milk']]
>>> print(mostFrequentPairs([['milk'],['eggs']]))
[]

Explanation / Answer

The given problem is nothing but the finding frequent item sets . this is a data mining concept. This program may help you in getting the output

db = []

item_count = {}

cnt = 0

def grow(itemsets):

r = []

i = 0

j = 1

while i < len(itemsets):

while j < len(itemsets) and itemsets[i][:-1] == itemsets[j][:-1]:

j += 1

for p in range(i, j):

for q in range(p+1, j):

r.append(itemsets[p]+itemsets[q][-1:])

i = j

return r

def contain(seta, setb):

i,j = 1, 0

while j < len(setb):

if i == len(seta): return False

if seta[i] > setb[j]: return False

elif seta[i] < setb[j]: i+=1

else: i, j = i+1, j+1

return True

def check(candidates):

r = []

candidates = map(tuple, candidates)

cnt = {}

for c in candidates: cnt[c] = 0

for transaction in db:

# print transaction

for candidate in candidates:

# print transaction, candidate, contain(transaction, candidate)

if cnt[candidate] < threshold and contain(transaction, candidate):

cnt[candidate] += 1

if cnt[candidate] >= threshold:

r.append(candidate)

print candidate

return r

import time

print time.asctime()

for line in open("f:/data/nausea.dat"):

parts = line.split()

db.append(parts[:1]+sorted(parts[1:]))

for item in db[-1][1:]:

# print item

item_count.setdefault(item, 0)

item_count[item] += 1

cnt += 1

threshold = 600

fitems = [item for item in item_count if item_count[item] >= threshold]

print len(db)

print len(fitems)

fitemsets = [[item] for item in sorted(fitems)]

print fitemsets

while len(fitemsets) > 0:

candidates = grow(fitemsets)

fitemsets = check(candidates)

print len(candidates), len(fitemsets), fitemsets

print time.asctime()

print contain(['f', 1,2, 3], [1,2,3])

db = []

item_count = {}

cnt = 0

def grow(itemsets):

r = []

i = 0

j = 1

while i < len(itemsets):

while j < len(itemsets) and itemsets[i][:-1] == itemsets[j][:-1]:

j += 1

for p in range(i, j):

for q in range(p+1, j):

r.append(itemsets[p]+itemsets[q][-1:])

i = j

return r

def contain(seta, setb):

i,j = 1, 0

while j < len(setb):

if i == len(seta): return False

if seta[i] > setb[j]: return False

elif seta[i] < setb[j]: i+=1

else: i, j = i+1, j+1

return True

def check(candidates):

r = []

candidates = map(tuple, candidates)

cnt = {}

for c in candidates: cnt[c] = 0

for transaction in db:

# print transaction

for candidate in candidates:

# print transaction, candidate, contain(transaction, candidate)

if cnt[candidate] < threshold and contain(transaction, candidate):

cnt[candidate] += 1

if cnt[candidate] >= threshold:

r.append(candidate)

print candidate

return r

import time

print time.asctime()

for line in open("f:/data/nausea.dat"):

parts = line.split()

db.append(parts[:1]+sorted(parts[1:]))

for item in db[-1][1:]:

# print item

item_count.setdefault(item, 0)

item_count[item] += 1

cnt += 1

threshold = 600

fitems = [item for item in item_count if item_count[item] >= threshold]

print len(db)

print len(fitems)

fitemsets = [[item] for item in sorted(fitems)]

print fitemsets

while len(fitemsets) > 0:

candidates = grow(fitemsets)

fitemsets = check(candidates)

print len(candidates), len(fitemsets), fitemsets

print time.asctime()

print contain(['f', 1,2, 3], [1,2,3])