In python, using default dictionaries thanks! The \"authorship attribution\" sys

ID: 3702688 • Letter: I

Question

In python, using default dictionaries thanks!

The "authorship attribution" system, is a system which attempts to determine who wrote a given document, based on analysis of the language used and style of that document.

The first step in our authorship attribution system will be to take a document, separate it out into its component words, and construct/return a dictionary of word frequencies. As we are focused on the English language, we will assume that "words" are separated by whitespace, in the form of spaces (' '), tabs (' ') and newline characters (' ').

We will also do something slightly unconventional in considering each "standalone" non-alphabetic character (i.e. any character other than whitespace, or upper- or lower-case alphabetic characters) to be a single word. For example, given the document 'Dynamic-typed variables, Python; really?!!', the component words, in sequence, would be 'Dynamic-typed' (noting that '-' here is not considered to be a word despite being non-alphabetic, as it is surrounded by alphabetic characters), 'variables', ',', 'Python', ';', 'really', '?', '!', '!'. Note here that, in the case of the document starting with 'Dynamic--typed', the breakdown into words would instead be 'Dynamic', '-', '-', and 'typed', as both of the hyphens neighbour a non-alphabetic letter. Note also that case should be preserved in the output (i.e. if a word is upper case in the original, it should remain in upper case).

Write a function authattr_worddict(doc) that takes a single string argument doc and returns a dictionary (dict) of words contained in doc (as defined above), with the frequency of each word as an int. Note that, as the output is a dict, the order of those words may not correspond exactly to that indicated below, and that the testing will accept any word ordering within the dictionary.

Here are some example calls to your authattr_worddict function:

>>> authattr_worddict('Dynamic-typed variables, Python; really?!!')
{'Dynamic-typed': 1, 'Python': 1, 'really': 1, '!': 2, 'variables': 1, '?': 1, ',': 1, ';': 1}
>>> authattr_worddict('')
{}
>>> authattr_worddict("Truly, rooly, rooly, indisputably 'tis ..... Gr00vy")
{"'": 1, 'vy': 1, '.': 5, '0': 2, 'tis': 1, 'rooly': 2, 'Truly': 1, 'indisputably': 1, 'Gr': 1, ',': 3}

Explanation / Answer

def authattr_worddict(s):
l=s.split()
c=['-','A','B','C','D','E','F','G','H','I','J','K','L','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
l1=''
x=''
for i in l:
for x in i:
if x in c:
l1+=x
else:
l1=l1+','+x+','
l1+=','
l1=l1.split(',')
d={x:l1.count(x) for x in l1 if x is not ''}
print(d)
authattr_worddict("Truly, rooly, rooly, indisputably 'tis ..... Gr00vy")
authattr_worddict('Dynamic-typed variables, Python; really?!!')

Navigate

In python, the following script gives the following output: my_seqfile = open(\"

In python, which satisfies the conditions below A \"CircularQueue\" is an abstra

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

In python, using default dictionaries thanks! The \"authorship attribution\" sys

Question

Explanation / Answer

Related Questions

Navigate