Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

class Tag: #Constructor - initializes the instance variables __tag_name and __ta

ID: 3746858 • Letter: C

Question

class Tag:
#Constructor - initializes the instance variables __tag_name and __tag_type
def __init__(self,tag_name,tag_type):
self.__TYPE_LIST = ["start","end","empty"]
self.__tag_name = tag_name
try:
self.__tag_type = self.__TYPE_LIST[tag_type]
except:
raise Exception("Invalid tag type")
#Returns the tag name
def get_tag_name(self):
return self.__tag_name
#Returns the tag type
def get_tag_type(self):
return self.__tag_type
#Checks whether two tags are matching tags - an opening and closing pair
def is_match(self,other):
if(self.__tag_name == Other.__tag_name):
if((self.__tag_type == "start" and Other.__tag_type == "end") or (self.__tag_type == "end" and Other.__tag_type == "start")):
return True
else:
return False
else:
return False
#Checks whether two tags are the same.
#Tags are the same only if they have the same name and are of the same type.
def __eq__(self,other):
if(self.__tag_name == Other.__tag_name and self.tag_type == Other.tag_type):
return True
else:
return False
#Returns string representation of tag
def __str__(self):
if (self.__tag_type == "start"):
return "String representation of Tag is : <"+self.__tag_name+"> ."

elif(self.__tag_type == "end"):
return "String representation of Tag is : </"+self.__tag_name+"> ."

else:
#returned only tag name if the tag type is Empty
return "String representation of Tag is : "+self.__tag_name

Processing the input file With the Tag class complete, you must now implement a function that will read an HTML file, extract the HTML tags and return them in a list. To do this you will need to complete the process html file () function in the HTML Processor abcd001.py file. Here is a simple algorithm you can follow: Create an empty list to store the tags you encounter . Read one character at a time from the data file, ignoring everything until you get to a "" (ignore the “c" as well). . Read one character at a time, appending them to a string until you get to a "" or a white space (ignore the "" and the white space as well) . The string you have built is the name of the tag. Use this tag name to create an instance of the Tag class and append it to the list. Hint: end tags have a "/" before their name. Once you have gone through the data file and completed your list of tags, return it.

Explanation / Answer

def myfunc():

str = ""

with open(r'filePath','r') as report_file:

raw_html = report_file.read()

str = ''.join(raw_html)

  

soup = BeautifulSoup(str)

meta_url = soup.noscript.meta['content']

url = re.search('-/(.*)?', meta_url).group(1)

print url

print soup.title.text