PYTHON ONLY PLEASE! 1. Open four websites of your choice a. Print the title of e
ID: 3717701 • Letter: P
Question
PYTHON ONLY PLEASE!
1. Open four websites of your choice
a. Print the title of each website
2. Open the web page https://commons.wikipedia.org/wiki/Main_Page
a. Search for all the links that contain the word Category.
b. As you find each link, open the link and print the first 10 links on that page
c. Write all links to a text file
3. On the website http://www.gutenberg.org/files/ is a list of ebooks. The text of each book can be found by using the full address (with book number 2000 as an example): http://www.gutenberg.org/files/2000/2000.txt.
a. The user is asked to enter a certain number of books for search for. Since the user does not know the number of all the books (such as the 2000 used above), that number should be a random number selected from the numbers 2000 to 8000.
b. When a valid number is found, print the first 300 characters of the book text.
c. If there is no matching number to the random number generated, give a message that says the book with that number cannot be found. Also, do not include that unsuccessful try in the total number of books being searched for. If the user entered five books to search for, the first 300 characters of five books should be printed.
Other requirements:
1. Use comments throughout
2. Each separate part must be in its own module. A main module will start when the program runs, and all other modules will be called from this main module.
Explanation / Answer
###Main.py###
#!/usr/bin/python
import titles
import links
# Now you can call defined function that module as follows
print("Reading Titles ")
urls = ['http://www.python.org/', 'http://www.google.com', 'http://www.bit.ly', 'http://www.bing.com']
titles.print_title(urls)
print(" Reading links with the word category ")
links.print_links("https://commons.wikipedia.org/wiki/Main_Page")
#so on
###titles.py###
import re
try:
# PY3
from urllib import request
except:
import urllib2 as request
def print_title( par ):
for url in par:
f = request.urlopen(url)
re_obj = re.compile(r'.*(<head.*<title.*?>(.*)</title>.*</head>)',re.DOTALL)
Found = False
data = ''
while True:
b_data = f.read(4096)
if not b_data: break
data += b_data.decode(errors='ignore')
match = re_obj.match(data)
if match:
Found = True
title = match.groups()[1]
print('title={}'.format(title))
break
f.close()
return
###links.py###
import urllib2
import re
def print_links( par ):
#connect to a URL
website = urllib2.urlopen(par)
#read html code
html = website.read()
#use re.findall to get all the links
links = re.findall('"((http|ftp)s?://.*?)"', html)
for url in links:
if 'Category' in url:
print url
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.