Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

RSS feeds are a popular way to keep track of news items, blog postings and so on

ID: 664867 • Letter: R

Question

RSS feeds are a popular way to keep track of news items, blog postings and so on. For this problem we’ll be working with the news feed from the following URL: http://feeds.nytimes.com/nyt/rss/World


The root of the element of the RSS feed is called RSS which has a child element CHANNEL which has a number of children including the ITEM element. Each ITEM element has a number of child elements such as TITLE, LINK, SOURCE, CATEGORY and so on for each of the news item in the feed. This file is an example of a namespaced file. You can view the page source to see what the XML looks like.


Write a program that reads the content form the URL above using XML methods and prints out the following information:


b. (10 points) Modify your output by adding the geographical regions (usually, but not always, countries) associated with the news item in the following format:


Israel, Egypt, Gaza Strip, West Bank : Op-Ed Contributor: Gaza and Israel: The Road to War, Paved by the West ( By NATHAN THRALL ) China : Sinosphere Blog: Q. and A.: Bill Porter on Journeys, Poets and Best-Sellerdom in China ( By IAN JOHNSON ) Ukraine : Malaysia Airlines Plane Leaves Trail of Debris ( By SABRINA TAVERNISE ) Kabul (Afghanistan), Afghanistan : Afghanistan Begins Audit of Presidential Election ( By MATTHEW ROSENBERG ) Etc.


Entries from the ‘Sinosphere Blog’ may not have properly formatted tags for region, so just put down “China” if they don’t list a proper region tag.

Here is my original code:

#!/usr/bin/python

from xml.dom.minidom import parse
import xml.dom.minidom

# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("news.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("RSS"):
   print "Root element : %s" % collection.getAttribute("RSS")

# Get all the movies in the collection
news = collection.getElementsByTagName("item")

# Print detail of each movie.
for item in news:
   print "*****News*****"
   if item.hasAttribute("title"):
      print item.getAttribute("title")

   creator = item.getElementsByTagName('creator')[0]
   print " : %s" % type.childNodes[0].data

and original output:

Op-Ed Contributor: Gaza and Israel: The Road to War, Paved by the West ( By NATHAN THRALL ) Sinosphere Blog: Q. and A.: Bill Porter on Journeys, Poets and Best-Sellerdom in China ( By IAN JOHNSON ) Malaysia Airlines Plane Leaves Trail of Debris ( By SABRINA TAVERNISE ) Afghanistan Begins Audit of Presidential Election ( By MATTHEW ROSENBERG ) Etc.


Should be in Python

Explanation / Answer

1)

import sys
import requests
from bs4 import BeautifulSoup

request = requests.get('http://feeds.nytimes.com/nyt/rss/World') #getting data from links
soup = BeautifulSoup(request.text)
dataItems = soup.find_all('item')
for dataitem in dataItems:
title = dataitem.find('title').text #printing titles and links
link = dataitem.find('link').text
comments = dataitem.find('comments').text #printing comments
print (title + ' - ' + link + ' - ' + comments) #pritnting all links

2)

#!/usr/bin/python

from xml.dom.minidom import parse
import xml.dom.minidom

# XML Parser
DOMTree = xml.dom.minidom.parse("news.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("RSS"):
   print "Root element : %s" % collection.getAttribute("RSS")

# Get all the movies in the collection
news = collection.getElementsByTagName("item")

# Print detail of each movie.
for item in news:
   print "*****News*****"
   if item.hasAttribute("title"):
      print item.getAttribute("title")

   creator = item.getElementsByTagName('creator')[0]
   print " : %s" % type.childNodes[0].data