Java algorithm : Use BFS to implement a crawler to crawl and discover the wikipe

ID: 3707623 • Letter: J

Question

Java algorithm: Use BFS to implement a crawler to crawl and discover the wikipedia graph. Called WikiCrawler.java

1 BFS and Web Graph We can the model web as a directed graph. Every web page is a vertex of the graph. We put a directed edge from a page p to a page q, if page p contains a link to page q. Recall the BFS algorithm from the lecture 1. Input: Ditrected Graph G V.E, and root e V. 2. Initialize a Queue Q and a list visited. 3. Place root in Q and visited. 4. while Q is not empty Do (a) Let v be the first element of Q (b) For every edge ?v,u) e E DO If uvisited add u to the end of Q, and add u to visited. If you output the vertices in visited, that will be BFS traversal of the input graph. We can use BFS algorithm on web graph also. Here, the input to the algorithm is a seed url (instead of entire web graph). View that page as the the root for the BFS traversal. Here is the BFS algorithm on Web graph.

Explanation / Answer

package proj2; import java.io.BufferedWriter; import java.io.File; import java.io.FileNotFoundException; import java.io.IOException; import java.io.PrintWriter; import java.net.URL; import java.util.ArrayList; import java.util.HashMap; import java.util.HashSet; import java.util.LinkedList; import java.util.Scanner; /** * * @author michael sila * * Just like you guys wanted */ public class WikiCrawler { static String BASE_URL="https://en.wikipedia.org"; private String seedUrl,fileName; private Integer maxPages; /** * See project description * @param seedUrl * @param max * @param fileName */ public WikiCrawler(String seedUrl,int max,String fileName) { this.seedUrl=seedUrl; this.maxPages=Integer.valueOf(max); this.fileName=fileName; } /** * The logic is in the linkscrapper class * @param doc the contents of an html wikipedia page * @return an list of html links (relative) */ public ArrayList extractLinks(String doc) { LinkScrapper extractor=new LinkScrapper(doc); return extractor.scrapeLink(); } public void crawl() { HashSet visited=new HashSet(); LinkedList queue=new LinkedList(); HashSet inGraph=new HashSet(); queue.add(seedUrl); //Lines to write to output ArrayList outputLines=new ArrayList(); int currentEdges=0; int timesPolled=0; inGraph.add(queue.peekFirst()); //Start BFS while (!queue.isEmpty()) { String currLink=queue.removeFirst(); timesPolled++; if (timesPolled%100==0) { //Sleep as required try { Thread.sleep(3000); } catch (InterruptedException e) { e.printStackTrace(); } } visited.add(currLink); ArrayList neighbors=this.extractLinks(this.readHTML(currLink)); for(String n: neighbors) { if (!visited.contains(n) && inGraph.size()

Navigate

Java add code in ??? Take a non-empty string composed only of letters \'A\' to \

Java algorithm help: Description: After hearing about the epidemic of obesity in

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Java algorithm : Use BFS to implement a crawler to crawl and discover the wikipe

Question

Explanation / Answer

Related Questions

Navigate