Python Write a class ImageParser that is a subclass of the HTMLParser class. It
ID: 3678041 • Letter: P
Question
Python
Write a class ImageParser that is a subclass of the HTMLParser class. It will find and collect the absolute URLs for all images found in an HTML file. The parser should collect the absolute URLs for the images displaying using image tags. Any image not in an image tag should not be collected.
The class should maintain two variables: A list to hold the absolute URLs of the images in the HTML file, and the URL upon which the parser was called. To implement this parser you will need to override the following methods of the HTMLParser class:
__init__: the constructor should call the constructor for the HTMLParser class and create and initialize the necessary class variables
handle_starttag: If the tag that resulted in the method being called is an image tag, the absolute URL of the image should be added to the list maintained in the class. The URL upon which the parser was called is useful for transforming relative URLs into absolute URLs.
getImgs: that returns the list of image URLs collected
The following shows what the test function would display on a couple of sample web pages. Your solution must work on any valid HTML page, not just the examples provided:
simple code
class ImageParser(HTMLParser):
def __init__(self, url):
HTMLParser.__init__(self)
def handle_starttag(self,tag,attrs):
pass
def getImgs(self):
pass
def returnImagesList(url):
content = urlopen(url)
content=content.read()
content=content.decode()
collector = ImageParser(url)
collector.feed(content)
imgs = collector.getImgs()
return imgs
Explanation / Answer
Can help you with this any help pls comment
from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
images = []
super(MyHTMLParser, self).__init__(*args, **kwargs)
if tag == "a":
for attr in attrs:
if attr[0] == "href" and attr[1].startswith("image"):
self.images=[]
def get_sysimage(url):
response = urllib2.urlopen(url)
html = response.read()
parser = MyHTMLParser()
parser.feed(html)
images = parser.images
print images
for url in urllist:
get_sysimage(url)
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.