Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Write a class ListParser that is a subclass of the HTMLParser class. It will fin

ID: 3832792 • Letter: W

Question

Write a class ListParser that is a subclass of the HTMLParser class. It will find and collect the contents of all the list items, both ordered and unordered, in an HTML file fed into it. The parser works by identifying and remembering, via a class variable, when a list item tag has been encountered. When the data handler for the class is called and the class variable indicates that a list item is currently open, the data in the list item is added to a list in the class. When the list item tag is closed, the class adjusts the internal variable to register this. To implement this parser you will need to override the following methods of theHTMLParser class:

__init__: the constructor should call the constructor for the HTMLParser class and create and initialize the necessary class variables

handle_starttag: If the tag that resulted in the method being called is a list item, the appropriate class variable should be set.

handle_endtag: If the tag that resulted in the method being called is a list item, the appropriate class variable should be unset.

handle_data: If the parser is currently inside a list item, the data should be added to the list of list items in the class. Strip any extra spaces or newlines off the contents of the list item before appending it to the class variable.

getItems: Returns the list of list items collected by the class.

You can find a template for the class and a test function testLParser() in the lab zip file. The following shows what the test function would display on some sample web pages:

Python 3.6.0 Shell Eile Edit Shell Debug Options Window Help testLParser ('http://facweb.cdm.depaul .edu/asettle/csc242/web/list 1.html Item 1 Item 22 Item A Item B Item B1 Item B2 Item B3 Item C 'X', 'Y'J lst testLParser http://facweb.cdm. depaul, .edu/asettle/csc242/web/list2.html 1st Cat Dog', Hermit crab Java C++ Lisp', Scheme r Python English r German Finnish Spanish Work days Monday Montag maanantai Weekend Saturday amstag, lauantai 1 testLParser ('http://facweb.cdm.depaul asettle/csc242/web/test.html Ln: 12 Co: 4

Explanation / Answer

class LinksParser(HTMLParser.HTMLParser): def __init__(self): HTMLParser.HTMLParser.__init__(self) self.seen = {} def handle_starttag(self, tag, attributes): if tag != 'div': return for name, value in attributes: if name == 'id' and value == 'remository': #print value return def handle_data(self, data): print data p = LinksParser() f = urllib.urlopen("http://domain.com/somepage.html") html = f.read() p.feed(html) p.close()

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote