Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Can someone who is not anonymous please reply to this question? Hlo responded be

ID: 3889950 • Letter: C

Question

Can someone who is not anonymous please reply to this question? Hlo responded before and did a great job but please also code in Python and use algorithms bfs and dfs. Also ask if you can comment each line so I have a greater understanding.

Create a web crawler. Following are the parameters:

Input. The crawler should take as command-line input:
• the name of a file containing a list of seed URLs;
• the maximum total number of pages to crawl (an integer);
• the name of a directory in which to save the crawled pages, one page per file; and
• a string that indicates the crawling algorithm that should be used (either dfs for depth-first

search or bfs for breadth-first search). For example, you might run your crawler like this:

which would start crawling from the URLs in the file seeds.txt, visit at most 200 pages, save each page in the directory pages/, and use a breadth-rst traversal. The seed file should be a list of URLs, one per line, like this:

Output. Your crawler should produce 2 kinds of output:

It should write the HTML code for the pages that it discovers, one file per page, into the output directory specified on the command line. Many pages on different websites will have the same name (e.g. almost every site on the web has a file named index.html), so you'll have to generate unique file names for each of these pages so that they don't overwrite each other in your output directory. One simple approach is to name the files as consecutive images; e.g. name the first file you download 0.html, the second 1.html, etc.

It should also output a file called index.txt that lists the mapping between the URLs and the filenames you've assigned locally. The file should also record the time that the page was downloaded. For example:

Crawler politeness. Your crawler should behave in an ethical and polite way, i.e. avoid placing unwelcome load on the network. For this purpose, you must avoid sending too many requests in rapid succession to a server. Furthermore, your crawler should obey the Robots Exclusion Protocol by which a webmaster may elect to exclude any crawler from all or parts of a site (see tips section below for how to do this). When you fetch web pages, make sure to identify yourself using an appropriate User-Agent and From string. For the User-Agent, use SCB-I427-login where login is your user name. For the From string, use your full email address. Note that compliance with these identification steps and the Robots Exclusion Protocol is an absolute requirement and is necessary to comply with the Spelman network use policy. If you experience issues, set User-Agent to *.

Crawling algorithms. As described above, the fourth parameter to the crawler species the traversal algorithm that should be used:

bfs should conduct a breadth-first traversal. Recall that this means that in any iteration of the

crawler, it should visit the page that has been in the request queue the longest.

dfs should conduct a depth-first traversal. Recall this means that the crawler should visit the

page that was most recently added to the request queue.

Explanation / Answer

Source
[python]
GRAPH = {1 : [2,3], 2:[4,5], 3:[6], 4:None, 5:[7,8], 6:None, 7:None, 8:None}

def BFS(start, target, GRAPH):
‘Use a QUEUE to search.’
print "Source:",source,"Target:",target
queue = [start]
visited = []

while len(queue) > 0:
x = queue.pop(0)

if x == target:
visited.append(x)
return visited
elif x not in visited:
visited = visited+[x]
if GRAPH[x] is not None:
‘add nodes at the END of the queue’
queue = queue + GRAPH[x]

return visited

def DFS(start, target, GRAPH):
‘Use a STACK to search.’
print "Source:",source,"Target:",target
stack = [start]
visited = []

while len(stack) > 0:
x = stack.pop(0)

if x == target:
visited.append(x)
return visited
elif x not in visited:
visited = visited+[x]
if GRAPH[x] is not None:
‘add nodes at the top of the stack’
stack = GRAPH[x] + stack

return visited

print "BFS Path",BFS(1,7,GRAPH)
print "DFS Path",DFS(1,7,GRAPH)
print "="*80
print "BFS Path",BFS(1,3,GRAPH)
print "DFS Path",DFS(1,3,GRAPH)
[/python]

Output:
[bash]
$ python graph.py
BFS Path Source: 1 Target: 7
[1, 2, 3, 4, 5, 6, 7]
DFS Path Source: 1 Target: 7
[1, 2, 4, 5, 7]
================================================================================
BFS Path Source: 1 Target: 3
[1, 2, 3]
DFS Path Source: 1 Target: 3
[1, 2, 4, 5, 7, 8, 3]
[/bash]

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote