PYTHON 1. We will perform the same task using the HTMLParser tool. When properly

ID: 3827540 • Letter: P

Question

PYTHON

1. We will perform the same task using the HTMLParser tool. When properly working, the program should be given a URL, and print (to the console) all of the headlines on the page.

In order to do this, we will need to write all 3 handler methods for HTMLParser. Here is a brief description of what each should do:

handle_startag: Checks the tag to see if it is a headline tag (<h1>, <h2>, or <h3>). We will not concern ourselves with a threshold for this assignment. If the tag is a headline tag, then a flag is set to True in order to indicate that a headline element has been found. This is necessary for the handle_data method to extract the headline.

handle_data: Check the flag set by handle_starttag to see if we are in a headline element. If so, the the headline should be printed.

handle_endtag: Sets the flag back to False, indicating that we are no longer in a headline element.

from urllib.request import *
from urllib.error import *
from html.parser import *
from urllib.parse import urljoin

class Headlines(HTMLParser):

    def __init__(self, url):
        HTMLParser.__init__(self)
        self.url = url
        self.tag = None
        self.f = open('headlines.html','w')

def handle_starttag(self, tag, attrs):
        if tag in ['h1', 'h2', 'h3']:
            pass   # REPLACE THIS

def handle_data(self, data):
        if self.tag != None:
            pass   # REPLACE THIS

    def handle_endtag(self, tag):
        if tag in ['h1', 'h2', 'h3']:
            pass # REPLACE THIS

def headlines(self):
        contents = urlopen(self.url).read().decode()
        self.feed(contents)
        self.f.close()

Explanation / Answer

from urllib.request import *
from urllib.error import *
from html.parser import *
from urllib.parse import urljoin
class Headlines(HTMLParser):
def __init__(self, url):
HTMLParser.__init__(self)
self.url = url
self.tag = None
self.f = open('headlines.html','w')


def handle_starttag(self, tag, attrs):
if tag in ['h1', 'h2', 'h3']:
flag=True
return flag

def handle_data(self, data):
if self.tag != None:
if flag==True:
print(data)

def handle_endtag(self, tag):
if tag in ['h1', 'h2', 'h3']:
flag=False
return flag

def headlines(self):
contents = urlopen(self.url).read().decode()
self.feed(contents)
self.f.close()

Navigate

PYTHON 1. Construct an object class named Card that will represent individual pl

PYTHON 1. Write a script (not a function!) called ngram_printer.py which asks th

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

PYTHON 1. We will perform the same task using the HTMLParser tool. When properly

Question

Explanation / Answer

Related Questions

Navigate