Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

One of the services Twitter provides its users is the ability to track the most

ID: 3687407 • Letter: O

Question

One of the services Twitter provides its users is the ability to track the most popular topics. For this part of the assignment you will do something similar. Your task is to keep track of the topics identified by users with the hashtag symbol ‘#’. You will also need to count the frequency of the hashtags you found and provide a ranking of hashtags based on their frequency. The output of your script should be one file, named top_hashtags.txt, with the N most popular hashtags, where N is a parameter to your function (Python). For example, assume this is the content of your twitter_data.txt file:

#lebron best athlete of our generation

ML 5 Demos! Lots of great stuff to come! Yes, I'm excited. :) http://htmlfive.appspot.com #io2009 #googleio

At GWT fireside chat #googleio

@khalid0456 No, Lebron is the best #lebron

If N is set to 2, then your script should generate a file top_hashtags.txt with the following content (note that in case of ties the order doesn’t matter):

#googleio 2

#lebron 2

twitter_data.txt link : https://drive.google.com/open?id=0BzB5lIrANOIPNXJVb3ZnbksxVTg

Explanation / Answer

Solution: See the code below

---------------------------------------------------

#This script extract top hash tags from twitter data and stores them to a file
#import of relevant packages
import codecs
import re
import csv

filename = "twitter_data.txt" #File name containing twitter data

#reading twitter from file
file=codecs.open(filename,"r","utf-8")
twitter_data=file.read()
#print(twitter_data)


#extraction of hashtags
pattern=re.compile(r"#(w+)")
tags=pattern.findall(twitter_data)
#print(len(tags))
tags_sorted_freq=sorted(tags,key=tags.count,reverse=True)
#print(tags_sorted_freq)

#counting freqencies of tags
from collections import Counter
tag_counts = Counter(tags_sorted_freq)
#tag_counts = Counter(tags)
#print(tag_counts)


#writing output data to a file
output_filename="top_hashtags.txt"
output_file=open(output_filename,"w")
output_file_writer=csv.writer(output_file)
for key,count in tag_counts.most_common():
    output_file_writer.writerow([key, count])


--------------------------------------------------------