Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

For this assignment, you are to write a Python program that will search the web

ID: 3706915 • Letter: F

Question

For this assignment, you are to write a Python program that will search the web for email addresses. You should write a function called _crawl(start, limit) that will return a list of all the unique email addresses found on all the webpages reachable from the webpage whose URL is start , except that you should stop if you have visited limit distinct web pages and return the list from that many pages. Types: start is a string. limit is an integer. You will return a list of strings (one string for each email address you find). Y

Remember to make your function work, and to use the email address finder that you need to add the following two lines to the start of your file:

import urllib.request as ur

import re

so it is find all url first, and go to contents, and find all unique address. there will be a 3 functions.

Explanation / Answer

import urllib, re

def _crawl(start, limit):
uniquemails = set()

#connect to url
website = urllib.urlopen(start)

#read html code
html = website.read()

#use re.findall to get all the links on this webpage
links = re.findall('"((http|ftp)s?://.*?)"', html)

for link in links:
w = urllib.urlopen(link)
str = f.read()

#use re.findall to get the emails on this webpage
emails = re.findall(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,4}",str)

for email in emails:
uniquemails.add(email)

print uniquemails

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote