When a web crawler is exploring the Internet looking for content to index for a
ID: 3577034 • Letter: W
Question
When a web crawler is exploring the Internet looking for content to index for a search engine, the crawler needs some way of detecting when it is visiting a copy of a website it has encountered before. Describe a way for a web crawler to store its web pages efficiently so that it can detect in O(n) time whether a web page of length n has been previously encountered and, if not, add it to the collection of previously encountered web pages in O(1) additional time. Explain clearly how your algorithm works.
Explanation / Answer
The crawl process begins with a list of web addresses from past crawls and sitemaps provided by website owners. As our crawlers visit these websites, they look for links for other pages to visit. The software pays special attention to new sites, changes to existing sites and dead links.
crawl process begin with to search which site has to be crawl
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.