Write a recursive function findTitles () that takes a string representing a name

ID: 3678040 • Letter: W

Question

Write a recursive function findTitles() that takes a string representing a name of a directory as a parameter and returns a list of all the titles found inside the html files in that directory or subdirectories of that directory, etc. Files that appear in more than one directory will only appear once in the title list returned. The titles in the list do not have to be ordered in any particular way.Please note that your function must be recursive and that it must work as described on any directory and file structure, not just the ones provided as examples. You may not make any assumption about the nesting of subdirectories, either with respect to how many subdirectories exist at each level or how deep the nesting of subdirectories is. You may not change the number of parameters of the function, define any helper functions, or use any global variables, and the function must return and not print the list of unique files. It is expected that a correct recursive solution to the problem will contain at least one loop. Your code must work on any operating system.

simple code

from html.parser import HTMLParser
from urllib.parse import urljoin
from urllib.request import urlopen

class Collector(HTMLParser):
def __init__(self, url):
HTMLParser.__init__(self)
self.url = url
self.links=[]

def handle_starttag(self,tag,attrs):
if tag == 'a':
for attr in attrs:
if attr[0] == 'href':
absolute = urljoin(self.url, attr[1])
if absolute[:4] == 'http':
self.links.append(absolute)

def getLinks(self):
return self.links

class Crawler(object):
def __init__(self):
self.visited = set()

def getFiles(self):
print(self.fileList)

def reset(self):
self.visited = set()


def crawl(self, url):
'recursive web crawler that calls analyze() on each web page'
self.visited.add(url)
links = self.analyze(url)
for link in links:
if link not in self.visited:
try:
self.crawl(link)
except:
pass

def analyze(self, url):
content = urlopen(url)
content=content.read()
content=content.decode()
collector = Collector(url)
collector.feed(content)
urls = collector.getLinks()
return urls

Python 3.5.0 Shell File Edit Shell Debug Options Window Help Python 3.5.0 (v3.5.0:374f501f4567, Sep 13 2015, 02:16:59) [MSC v.1900 32 bit (In- tel)1 on win3:2 Type "copyright", "credits" or "license " for more information. >>crawler-Crawler) crawler.crawl ('http://facweb.cdm.depaul.edu/asettle/csc242/web/fEOne.html' >>> crawler·titles 'First test page' 'Test page'1 >>> crawler-Crawler O >crawler.crawl ('http://facweb.cdm.depaul.edu/asettle/csc242/web/one.html' >>>crawler.titles II'Page One', 'Page Three','Page Four', ['Page Five', ['Page Two'11

Explanation / Answer

# operating system list directory
os.listdir()

import os

for nameOfDirectory, dirnames, filenames in os.walk('.'):
for namesOfSubDirectories in dirnames:
return (os.path.join(nameOfDirectory, namesOfSubDirectories))

for nameOfFile in filenames:
return(os.path.join(nameOfDirectory, nameOfFile))

for top, dirs, files in os.walk('/'):
for name in files:
return os.path.join(top, name)

Navigate

Write a recursive function diag2list_rec(grid ), where grid is an n × n square l

Write a recursive function in C++ that takes as an input parameter a string and

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Write a recursive function findTitles () that takes a string representing a name

Question

Explanation / Answer

Related Questions

Navigate