Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Write a class ListParser that is a subclass of the HTMLParser class. It will fin

ID: 3676774 • Letter: W

Question

Write a class ListParser that is a subclass of the HTMLParser class. It will find and collect the contents of all the list items, both ordered and unordered, in an HTML file fed into it. The parser works by identifying and remembering, via a class variable, when a list item tag has been encountered. When the data handler for the class is called and the class variable indicates that a list item is currently open, the data in the list item is added to a list in the class. When the list item tag is closed, the class adjusts the internal variable to register this. To implement this parser you will need to override the following methods of the HTMLParser class:

__init__: the constructor should call the constructor for the HTMLParser class and create and initialize the necessary class variables

handle_starttag: If the tag that resulted in the method being called is a list item, the appropriate class variable should be set.

handle_endtag: If the tag that resulted in the method being called is a list item, the appropriate class variable should be unset.

handle_data: If the parser is currently inside a list item, the data should be added to the list of list items in the class. Strip any extra spaces or newlines off the contents of the list item before appending it to the class variable.

getItems: Returns the list of list items collected by the class.

You can find a template for the class and a test function testLParser() in the assignment zip file. The following shows what the test function would display on some sample web pages:

Explanation / Answer


from html.parser import HTMLParser
from urllib.request import urlopen


class ListParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.lstLst = []
        self.insideList = False

    def handle_starttag(self, tag, attrs):
        if tag == 'li':
            self.insideList = True

    def handle_endtag(self, tag):
        if tag == 'li':
            self.insideList = False

    def handle_data(self, data):
        if self.insideList == True:
            self.lstLst.append(data.strip())

    def getItems(self):
        return self.lstLst

def testLParser(url):
    'Test the ListParser class'
    content = urlopen(url).read().decode()
    lParser = ListParser()
    lParser.feed(content)
    return lParser.getItems()

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote