Python Question!!! The aim of this exercise is to find those pairs of products t
ID: 671643 • Letter: P
Question
Python Question!!!
The aim of this exercise is to find those pairs of products that appear most frequently in customers' bills.
You are given a list of shopping bills, in the same format as Question 1. You can assume that all inputs are of the correct type.
Your goal is to write a function that finds the product pairs that occur most frequently in the given bills. Thus, your function would return [['eggs', 'milk']] given the list of bills [['milk','eggs'],['milk','milk','tofu', 'eggs'],['eggs']]. In this case, the pair 'eggs' and 'milk' appeared in two bills. Note that the items in each bill should be treated as unique when creating pairs. For example, in the second bill in the example above, the possible pairs are ['milk', 'tofu'], ['eggs', 'milk']and ['eggs', 'tofu']. You should not include ['milk', 'milk'].
You should write a function mostFrequentPairs(bills) that takes a list of shopping bills bills and returns the list of lists of products that appear most frequently together. If there are two or more pairs of products that have the highest frequency, then the return list containing those pairs of products. Each pair of products should be in sorted order, and the list of lists should also be in sorted order. If you are given an empty list of bills, you should return the empty list. Note that you should not use the itertools.combinations function in the implementation of your function.
Example calls to the function are:
>>> print(mostFrequentPairs([['milk','eggs'],['milk','milk','tofu', 'eggs'],['eggs']]))
[['eggs', 'milk']]
>>> print(mostFrequentPairs([['milk','eggs'],['milk','milk','tofu', 'eggs'],['tofu','eggs']]))
[['eggs', 'milk'], ['eggs', 'tofu']]
>>> print(mostFrequentPairs([['milk','eggs'],['milk','milk','tofu', 'eggs'],[]]))
[['eggs', 'milk']]
>>> print(mostFrequentPairs([['milk'],['eggs']]))
[]
Explanation / Answer
The given problem is nothing but the finding frequent item sets . this is a data mining concept. This program may help you in getting the output
db = []
item_count = {}
cnt = 0
def grow(itemsets):
r = []
i = 0
j = 1
while i < len(itemsets):
while j < len(itemsets) and itemsets[i][:-1] == itemsets[j][:-1]:
j += 1
for p in range(i, j):
for q in range(p+1, j):
r.append(itemsets[p]+itemsets[q][-1:])
i = j
return r
def contain(seta, setb):
i,j = 1, 0
while j < len(setb):
if i == len(seta): return False
if seta[i] > setb[j]: return False
elif seta[i] < setb[j]: i+=1
else: i, j = i+1, j+1
return True
def check(candidates):
r = []
candidates = map(tuple, candidates)
cnt = {}
for c in candidates: cnt[c] = 0
for transaction in db:
# print transaction
for candidate in candidates:
# print transaction, candidate, contain(transaction, candidate)
if cnt[candidate] < threshold and contain(transaction, candidate):
cnt[candidate] += 1
if cnt[candidate] >= threshold:
r.append(candidate)
print candidate
return r
import time
print time.asctime()
for line in open("f:/data/nausea.dat"):
parts = line.split()
db.append(parts[:1]+sorted(parts[1:]))
for item in db[-1][1:]:
# print item
item_count.setdefault(item, 0)
item_count[item] += 1
cnt += 1
threshold = 600
fitems = [item for item in item_count if item_count[item] >= threshold]
print len(db)
print len(fitems)
fitemsets = [[item] for item in sorted(fitems)]
print fitemsets
while len(fitemsets) > 0:
candidates = grow(fitemsets)
fitemsets = check(candidates)
print len(candidates), len(fitemsets), fitemsets
print time.asctime()
print contain(['f', 1,2, 3], [1,2,3])
db = []
item_count = {}
cnt = 0
def grow(itemsets):
r = []
i = 0
j = 1
while i < len(itemsets):
while j < len(itemsets) and itemsets[i][:-1] == itemsets[j][:-1]:
j += 1
for p in range(i, j):
for q in range(p+1, j):
r.append(itemsets[p]+itemsets[q][-1:])
i = j
return r
def contain(seta, setb):
i,j = 1, 0
while j < len(setb):
if i == len(seta): return False
if seta[i] > setb[j]: return False
elif seta[i] < setb[j]: i+=1
else: i, j = i+1, j+1
return True
def check(candidates):
r = []
candidates = map(tuple, candidates)
cnt = {}
for c in candidates: cnt[c] = 0
for transaction in db:
# print transaction
for candidate in candidates:
# print transaction, candidate, contain(transaction, candidate)
if cnt[candidate] < threshold and contain(transaction, candidate):
cnt[candidate] += 1
if cnt[candidate] >= threshold:
r.append(candidate)
print candidate
return r
import time
print time.asctime()
for line in open("f:/data/nausea.dat"):
parts = line.split()
db.append(parts[:1]+sorted(parts[1:]))
for item in db[-1][1:]:
# print item
item_count.setdefault(item, 0)
item_count[item] += 1
cnt += 1
threshold = 600
fitems = [item for item in item_count if item_count[item] >= threshold]
print len(db)
print len(fitems)
fitemsets = [[item] for item in sorted(fitems)]
print fitemsets
while len(fitemsets) > 0:
candidates = grow(fitemsets)
fitemsets = check(candidates)
print len(candidates), len(fitemsets), fitemsets
print time.asctime()
print contain(['f', 1,2, 3], [1,2,3])
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.