Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

i have to create internet news archive by using python with a GUI. have i docume

ID: 3874680 • Letter: I

Question

i have to create internet news archive by using python with a GUI. have i document with downloaded news from WSJ online regularly updated source. Under the user’s control, it extracts individual elements from web documents stored in an “archive” (a folder of previously downloaded HTML/XML documents) and uses these elements to construct an entirely new HTML document which summarises a particular days’ top stories in a form which can be viewed in a standard web browser. It also allows the user to download the current version of the source web page and store it in the archive as the “latest” news. i want to know how to create it? Development hints This is a substantial task, so you should not attempt to do it all at once. In particular, you should work on the “back end” parts first, designing your HTML document and working out how to extract the necessary elements for it from the archived web pages, before attempting the Graphical User Interface “front end”. If you are unable to complete the whole task, just submit those stages you can get working. You will receive partial marks for incomplete solutions. It is suggested that you use the following development process: 1. Find a suitable source of online news articles. Keep in mind that the web site you choose must be updated daily, must have at least ten articles online at any time, and each article must have a heading, a short synopsis, a (link to a) photograph, a link to a full description of the story, and a publication date. Some starting points for finding such sites can be found in Appendix A below. 2. Create your “Internet Archive” by downloading seven copies of the web site, one per day. You should do so using the provided downloader.py program or a similar Python application so that the documents in your archive have the same structure that your application will see when downloading the “latest” news. (Otherwise you will have to write separate pattern matching code for extracting items from the archive and from the latest download!) 3. Study the HTML/XML source code of the archived documents to determine how the elements you want to extract are marked up. Typically you will want to identify the markup tags, and perhaps other unchanging parts of the document, that uniquely identify the beginning and end of the text and image addresses you want to extract. 4. Using the provided regex_tester.py application, devise regular expressions which extract just the necessary elements from the relevant parts of the archived web documents. 5. You can now develop a simple prototype of your “back end” function(s) that just extracts and saves the required elements from an archived web document. 6. Design the HTML source code for your “extracted” news stories, with appropriate placeholders for the downloaded web elements you will insert. Keep the document simple and its source code neat. 7. Develop the necessary Python code to extract the HTML elements from an archived document and create the HTML file. This completes the major “back end” part of your solution. 8. Develop a function to download the “latest” news and save it in the archive. (The provided downloader.py application is very helpful at this point!) 9. Develop a function to open a given HTML document in the host computer’s default web browser. Importantly, this needs to be done in a way that will work on any computing platform. The provided template file contains hints for how to do this, using Python functions for finding the “path” to the folder in which the Python script resides (getcwd), for “normalising” full file names to the local computing environment (normpath), and for opening a file in the host system’s default web browser (called webopen in our template). 10. Add the Graphical User Interface “front end” to your program. Decide whether you want to use push buttons, radio buttons, menus, lists or some other mechanism for choosing, extracting and displaying archived news. Developing the GUI is the “messiest” step, and is best left to the end.

Explanation / Answer

Answer:

1. Source of online new articles: You can select any leading news website like CNN (http://cnn.com/), BBC (http://bbc.co.uk/), WSJ (https://www.wsj.com/) etc. as soure of news articles.

2. Download news articles from web: For this purpose, 'newspaper' package of python can be used. If not already installed, it can be installed by using following command:

-------------------------------

pip install newspaper3k

-------------------------------

Use following code to download articles from a given source urls:

--------------------------------

#import newspaper package
import newspaper

#source url
source_url = "http://www.bbc.co.uk"

#news source
news_source = newspaper.build(source_url)

#download articles
num_articles = 10
count = 0
for article in news_source.articles:
    count = count+1
    print("Downloading article...",count)
    article.download()
    filename = "article"+str(count)+".html"
    file = open(filename,"w")
    file.write(article.html)
    file.close()
    print("Article downloaded!")  
    if count == num_articles:
        break

-----------------------------------------

3. Study the HTML/XML markup: This can be done by opening the downloaded html file in any text editor.

4. Regex expressions for extracting content from html: For extracting content, BeautifulSoup package in python can be used.

'