Write a class ListParser that is a subclass of the HTMLParser class. It will fin
ID: 3832792 • Letter: W
Question
Write a class ListParser that is a subclass of the HTMLParser class. It will find and collect the contents of all the list items, both ordered and unordered, in an HTML file fed into it. The parser works by identifying and remembering, via a class variable, when a list item tag has been encountered. When the data handler for the class is called and the class variable indicates that a list item is currently open, the data in the list item is added to a list in the class. When the list item tag is closed, the class adjusts the internal variable to register this. To implement this parser you will need to override the following methods of theHTMLParser class:
__init__: the constructor should call the constructor for the HTMLParser class and create and initialize the necessary class variables
handle_starttag: If the tag that resulted in the method being called is a list item, the appropriate class variable should be set.
handle_endtag: If the tag that resulted in the method being called is a list item, the appropriate class variable should be unset.
handle_data: If the parser is currently inside a list item, the data should be added to the list of list items in the class. Strip any extra spaces or newlines off the contents of the list item before appending it to the class variable.
getItems: Returns the list of list items collected by the class.
You can find a template for the class and a test function testLParser() in the lab zip file. The following shows what the test function would display on some sample web pages:
Python 3.6.0 Shell Eile Edit Shell Debug Options Window Help testLParser ('http://facweb.cdm.depaul .edu/asettle/csc242/web/list 1.html Item 1 Item 22 Item A Item B Item B1 Item B2 Item B3 Item C 'X', 'Y'J lst testLParser http://facweb.cdm. depaul, .edu/asettle/csc242/web/list2.html 1st Cat Dog', Hermit crab Java C++ Lisp', Scheme r Python English r German Finnish Spanish Work days Monday Montag maanantai Weekend Saturday amstag, lauantai 1 testLParser ('http://facweb.cdm.depaul asettle/csc242/web/test.html Ln: 12 Co: 4Explanation / Answer
class LinksParser(HTMLParser.HTMLParser): def __init__(self): HTMLParser.HTMLParser.__init__(self) self.seen = {} def handle_starttag(self, tag, attributes): if tag != 'div': return for name, value in attributes: if name == 'id' and value == 'remository': #print value return def handle_data(self, data): print data p = LinksParser() f = urllib.urlopen("http://domain.com/somepage.html") html = f.read() p.feed(html) p.close()
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.