Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

PYTHON CHECK SIMPLE CODE Write a recursive function findTitles () that takes a s

ID: 3678115 • Letter: P

Question

PYTHON CHECK SIMPLE CODE

Write a recursive function findTitles() that takes a string representing a name of a directory as a parameter and returns a list of all the titles found inside the html files in that directory or subdirectories of that directory, etc. Files that appear in more than one directory will only appear once in the title list returned. The titles in the list do not have to be ordered in any particular way.Please note that your functionmust be recursive and that it must work as described on any directory and file structure, not just the ones provided as examples. You may not make any assumption about the nesting of subdirectories, either with respect to how many subdirectories exist at each level or how deep the nesting of subdirectories is. You may not change the number of parameters of the function, define any helper functions, or use any global variables, and the function must return and not printthe list of unique files. It is expected that a correct recursive solution to the problem will contain at least one loop. Your code must work on any operating system

from html.parser import HTMLParser
from urllib.parse import urljoin
from urllib.request import urlopen


class Collector(HTMLParser):
def __init__(self, url):
HTMLParser.__init__(self)
self.url = url
self.links=[]


def handle_starttag(self,tag,attrs):
if tag == 'a':
for attr in attrs:
if attr[0] == 'href':
absolute = urljoin(self.url,

def getLinks(self):
return self.links

class Crawler(object):
def __init__(self):
self.visited = set()
  

def getFiles(self):
print(self.fileList)
  
def reset(self):
self.visited = set()
  
  
def crawl(self, url):
'recursive web crawler that calls analyze() on each web page'
self.visited.add(url)
links = self.analyze(url)
for link in links:
if link not in self.visited:
try:
self.crawl(link)
except:
pass
  
def analyze(self, url):
content = urlopen(url)
content=content.read()
content=content.decode()
collector = Collector(url)
collector.feed(content)
urls = collector.getLinks()
return urls

attr[1])
if absolute[:4] == 'http':
self.links.append(absolute)
  
  

Explanation / Answer

import os

import glob

path = 'pahtofYourDirectory/'

for infile in glob.glob( os.path.join(path, '*.fasta') ):

print "current file is: " + infile