Use PYTHON to solve this: Given a file of “documents” where each document occupi
ID: 3747938 • Letter: U
Question
Use PYTHON to solve this:
Given a file of “documents” where each document occupies a line of the file, you are to build a data structure (called an inverse index) that allows you to identify those documents containing a given word. We will identify the documents by document number: the document represented by the first line of the file is document number 0, that represented by the second line is document number 1, and so on. Make matching of case-insensitive (e.g. Wall and wall are the same words) Note that the period is considered part of a substring. To make this easier, we have a file of documents in which punctuation are separated from words by spaces. Often one wants to iterate through the elements of a list while keeping track of the indices of the elements. Python provides enumerate(L) for this purpose.
a) Write a procedure make_inverse_index(strlist) that, given a list of strings (documents), returns a dictionary that maps each word to the set consisting of the document numbers of documents in which that word appears. This dictionary is called an inverse index. (Hint: use enumerate.)
b) Write a procedure or_search(inverseIndex, query) which takes an inverse index and a list of words query, and returns the set of document numbers specifying all documents that contain any of the words in query.
c) Write a procedure and_search(inverseIndex, query) which takes an inverse index and a list of words query, and returns the set of document numbers specifying all documents that contain all of the words in query.
d) Try out your procedures on stories.txt. Try or_search, and_search with the queries “united states” and “wall street” on the documents in stories.txt and save the resulting lists (tw lists of document ids per query) in a file named results.txt.
Explanation / Answer
"""A word search solver"""
from collections import namedtuple
from itertools import product
import re
import sys
Direction = namedtuple('Direction', 'di dj name')
DIRECTIONS = [
Direction(-1, -1, "up and to the left"),
Direction(-1, 0, "up"),
Direction(-1, +1, "up and to the right"),
Direction( 0, -1, "left"),
Direction( 0, +1, "right"),
Direction(+1, -1, "down and to the left"),
Direction(+1, 0, "down"),
Direction(+1, +1, "down and to the right"),
]
def read_grid(filename):
"""
Read a word search puzzle from a file into a 2D matrix of uppercase letters.
"""
with open(filename) as f:
return [re.findall('[A-Z]', line.upper()) for line in f]
def extract(grid, i, j, dir, length):
"""
Extract letters from the grid, starting at row i column j, as a string.
If the extraction will walk out of bounds, return None.
"""
if ( 0 <= i + (length - 1) * dir.di < len(grid) and
0 <= j + (length - 1) * dir.dj < len(grid[i]) ):
return ''.join(
grid[i + n * dir.di][j + n * dir.dj] for n in range(length)
)
return None
def search(grid, word):
"""
Search for a word in a grid, returning a tuple of the starting row,
starting column, and direction. If the word is not found, return None.
"""
word_len = len(word)
for i, j, dir in product(range(len(grid)), range(len(grid[0])), DIRECTIONS):
if word == extract(grid, i, j, dir, word_len):
return i, j, dir
return None
def main(filename, word):
grid = read_grid(filename)
match = search(grid, word.upper())
if match is None:
print("Didn't find a match.")
else:
i, j, dir = match
print("Found a match at line {0}, column {1} going {2}".format(
i + 1, j + 1, dir.name))
if __name__ == '__main__':
main('input.txt', sys.argv[1])
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.