The next step in our authorship attribution system will be to take two dictionar

Q: The next step in our authorship attribution system will be to take two dictionar

Code import operator def corresponding_ranking(dictfreq): freq_char={} for data in dictfreq: if(data[1]) in freq_char: freq_char[data[1]]=freq_char[data[1]]+1 else: freq_char[data[1]]=1 rank=1 rank_list={} i=0 while(i

ID: 3709241 • Letter: T

Question

The next step in our authorship attribution system will be to take two dictionaries of word counts and count the similarity between them. We will do this by:

ranking the two sets of words in descending order of frequency, and;

for corresponding word pairs, calculate the absolute difference in rank between the two.

If a word is found in one ranking but not the other, we will set the ranking for the second to the value maxrank (provided as part of the function call). In the case of a tie in the word frequency ranking (due to multiple words having the same frequency), we will assign all items the same value, calculated as follows:

For example, if two items were tied for second, we would assign each of them the rank . The ranking of the next item would then be 4 rather than 3, as two places in the ranking have been taken. For example, if the first dictionary was {'a':10, 'b': 5, 'c': 5, 'd': 2, 'e': 2, 'f': 2, 'g': 1}(i.e. 'a' occurs 10 times, 'b' 5 times, etc.), then the corresponding ranking would be:

Note that 'd', 'e' and 'f' are assigned a ranking of 5 because they are all tied for fourth (three items precede them), and

If the second ranking were:

Then the combined ranking would be:

The final step is to calculate the "out-of-place" distance between the two rankings, by calculating the total absolute difference between the respective rankings for each word contained in the union of the rankings ... which is just a complicated way of saying, for each row in the table above calculate the absolute difference between the two ranking values (e.g. for 'a', ), and sum up across all the rows. Assuming that maxrank is equal to 10, the value for the case above would be:

Write a function authattr_oop(dictfreq1, dictfreq2,maxrank) that takes three arguments:

dictfreq1: a dictionary of words, with the (positive integer) frequency of each

dictfreq2: a second dictionary of words, with the (positive integer) frequency of each

maxrank: the positive int value to set the ranking to in the case that the word isn't in the dictionary of words in question

and returns a float out-of-place distance between the two (where the smaller the number, the more similar the two rankings are).

Here are some example calls to your authattr_oop function:

word frequency ranking 'a' 10 1 'b' 5 2.5 'c' 5 2.5 'd' 2 5 'e' 2 5 'f' 2 5 'g' 1 7

Explanation / Answer

Code

import operator
def corresponding_ranking(dictfreq):
freq_char={}
for data in dictfreq:
if(data[1]) in freq_char:
freq_char[data[1]]=freq_char[data[1]]+1
else:
freq_char[data[1]]=1

rank=1
rank_list={}

i=0
while(i<len(dictfreq)):
if(freq_char[dictfreq[i][1]]==1):
rank=i+1
rank_list[dictfreq[i][0]]=rank
i=i+1

else:
k=0
temp_rank=0
while(k<freq_char[dictfreq[i][1]]):
k=k+1
rank = rank + 1
temp_rank = temp_rank + rank


n=freq_char[dictfreq[i][1]]

k=0
while(k<n):
rank_list[dictfreq[i][0]]=temp_rank/float(n)
i=i+1
k=k+1
return rank_list
def authattr_oop(dictfreq1,dictfreq2,maxrank):
temp_freq1=dictfreq1
temp_freq2=dictfreq2
dictfreq1=sorted(dictfreq1.items(), key=operator.itemgetter(1),reverse=True)
dictfreq2=sorted(dictfreq2.items(), key=operator.itemgetter(1),reverse=True)

rank_list1 = corresponding_ranking(dictfreq1)
rank_list2 = corresponding_ranking(dictfreq2)
eval_char=set()


ans=float(0)
for char,freq in temp_freq1.items():
if char in temp_freq2:
eval_char.add(char)
ans += abs(rank_list1[char]-rank_list2[char])
else:
eval_char.add(char)
ans += abs(rank_list1[char]-maxrank)

for char,freq in temp_freq2.items():
if char not in eval_char:
ans += abs(rank_list2[char]-maxrank)
return ans
if __name__ == '__main__':
output1 = authattr_oop({'a': 10, 'b': 5, 'c': 5, 'd': 2, 'e': 2, 'f': 2, 'g': 1}, {'b': 27, 'h': 22, 'a': 11, 'i': 11, 'j': 5}, 10)
print("First test case output is: ",output1)
output2 = authattr_oop({'a': 5000, 'b': 4000, 'c': 3000}, {'a': 5, 'b': 4, 'c':3}, 100)
print("Second test case output is: ",output2)
output3 = authattr_oop({'a': 5000, 'b': 4000, 'c': 3000}, {'d': 5, 'e': 4, 'f':3}, 100)
print("Third test case output is: ",output3)

Navigate

The next step for Dr Washington is to help him compare stocks and make good inve

The next step is deciding which option your group chooses. You should post your

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

The next step in our authorship attribution system will be to take two dictionar

Question

Explanation / Answer

Related Questions

Navigate