Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Python Write a class ImageParser that is a subclass of the HTMLParser class. It

ID: 3678041 • Letter: P

Question

Python

Write a class ImageParser that is a subclass of the HTMLParser class. It will find and collect the absolute URLs for all images found in an HTML file. The parser should collect the absolute URLs for the images displaying using image tags. Any image not in an image tag should not be collected.

The class should maintain two variables: A list to hold the absolute URLs of the images in the HTML file, and the URL upon which the parser was called. To implement this parser you will need to override the following methods of the HTMLParser class:

__init__: the constructor should call the constructor for the HTMLParser class and create and initialize the necessary class variables

handle_starttag: If the tag that resulted in the method being called is an image tag, the absolute URL of the image should be added to the list maintained in the class. The URL upon which the parser was called is useful for transforming relative URLs into absolute URLs.

getImgs: that returns the list of image URLs collected

The following shows what the test function would display on a couple of sample web pages. Your solution must work on any valid HTML page, not just the examples provided:

simple code

class ImageParser(HTMLParser):
def __init__(self, url):
HTMLParser.__init__(self)
  

def handle_starttag(self,tag,attrs):
pass
  
  
def getImgs(self):
pass

def returnImagesList(url):
content = urlopen(url)
content=content.read()
content=content.decode()
collector = ImageParser(url)
collector.feed(content)
imgs = collector.getImgs()
return imgs

Python 3.5.0 Shell File Edit Shell Debug Options Window Help Python 3.5.0 (v3.5.0:374150114567, Sep 13 2015, 02:16:59) [MSC v.1900 32 bit (In - tel)] on win32 Type "copyright", "credits" or "license )" for more information. RESTART : C:/Users/jeand_000/Desktop/CSC2 4 2 /CSC2 4 2 FINAL/part sol.py >>> returnImagesList("http://facweb.cdm.depaul.edu/asettle/csc242/web/fEOne.htm 'http://facweb.cdm.depaul.edu/asettle/csc242/web/mario_luigi_kirby_and_tiff, by_koopa_master-d512k3b.png', 'http://facweb.cdm.depa eace.gif'] >>>returnImagesList ("http://facweb.cdm.depaul.edu/asettle/csc242/web/fETwo.html ul.edu/asettle/csc2427web/p 'http://facweb.cdm.depaul.edu/asettle/csc242/web/cookie.jpg' >>>returnImagesList ("http://facweb.cdm.depaul.edu/asettle/csc242/web/test.html" [1

Explanation / Answer

Can help you with this any help pls comment

from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
images = []
super(MyHTMLParser, self).__init__(*args, **kwargs)
if tag == "a":
for attr in attrs:
if attr[0] == "href" and attr[1].startswith("image"):
self.images=[]
def get_sysimage(url):
response = urllib2.urlopen(url)
html = response.read()
parser = MyHTMLParser()
parser.feed(html)
images = parser.images
print images
for url in urllist:
get_sysimage(url)