Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

For this assignment, you need to use python to compute the term-frequency matrix

ID: 3578684 • Letter: F

Question

For this assignment, you need to use python to compute the term-frequency matrix for a set of documents.

A term frequency matrix is a table, where rows represent documents and columns represent the terms/words. The value in cell (i,j) is the number of times that word j occurs in document i.

To do this, your python program first needs to go through the files in the input folder, where each file is a separate document (thus, the number of documents in the number of files), and build a set of all unique terms across all the documents.

Let's call this list of terms T, which contains n terms.

Then you'll need to go through each file/document, and compute the number of times that each of the n words occurs in that document. Doing this, you will produce the term-document matrix.

The program should save this matrix in a file, where each row of the matrix appears on a separate line, and all terms occurrence frequencies are separated by commas.

The folder with the documents, representing movie reviews, is included in the assignment.

Here are stop words

'i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', 'couldn', 'didn', 'doesn', 'hadn', 'hasn', 'haven', 'isn', 'ma', 'mightn', 'mustn', 'needn', 'shan', 'shouldn', 'wasn', 'weren', 'won', 'wouldn'


Explanation / Answer

import os
def wordCount(fileName):
file=open(fileName,"r+")
wordcount={}
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
return wordcount
  
m = []
fileNum = 1
#give folder path here
path = folderpath
files = os.listdir(path)
for file in files:
d = {}
if (os.path.isfile(file)):
m[fileNum][0] = file
d = wordCount(file)
i=0
for key, value in d.items():
allKeys = [i[0] for i in m]
if key in allKeys:
for i in range(len(allKeys)):
if allKeys[i] == key:
m[fileNum][i] = value
else:
allKeys.append(key)
for i in range(len(allKeys)):
if allKeys[i] == key:
m[fileNum][i] = value

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote