RSS feeds are a popular way to keep track of news items, blog postings and so on

ID: 664867 • Letter: R

Question

RSS feeds are a popular way to keep track of news items, blog postings and so on. For this problem we’ll be working with the news feed from the following URL: http://feeds.nytimes.com/nyt/rss/World

The root of the element of the RSS feed is called RSS which has a child element CHANNEL which has a number of children including the ITEM element. Each ITEM element has a number of child elements such as TITLE, LINK, SOURCE, CATEGORY and so on for each of the news item in the feed. This file is an example of a namespaced file. You can view the page source to see what the XML looks like.

Write a program that reads the content form the URL above using XML methods and prints out the following information:

b. (10 points) Modify your output by adding the geographical regions (usually, but not always, countries) associated with the news item in the following format:

Israel, Egypt, Gaza Strip, West Bank : Op-Ed Contributor: Gaza and Israel: The Road to War, Paved by the West ( By NATHAN THRALL ) China : Sinosphere Blog: Q. and A.: Bill Porter on Journeys, Poets and Best-Sellerdom in China ( By IAN JOHNSON ) Ukraine : Malaysia Airlines Plane Leaves Trail of Debris ( By SABRINA TAVERNISE ) Kabul (Afghanistan), Afghanistan : Afghanistan Begins Audit of Presidential Election ( By MATTHEW ROSENBERG ) Etc.

Entries from the ‘Sinosphere Blog’ may not have properly formatted tags for region, so just put down “China” if they don’t list a proper region tag.

Here is my original code:

#!/usr/bin/python

from xml.dom.minidom import parse
import xml.dom.minidom

# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("news.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("RSS"):
print "Root element : %s" % collection.getAttribute("RSS")

# Get all the movies in the collection
news = collection.getElementsByTagName("item")

# Print detail of each movie.
for item in news:
   print "*****News*****"
   if item.hasAttribute("title"):
      print item.getAttribute("title")

creator = item.getElementsByTagName('creator')[0]
print " : %s" % type.childNodes[0].data

and original output:

Op-Ed Contributor: Gaza and Israel: The Road to War, Paved by the West ( By NATHAN THRALL ) Sinosphere Blog: Q. and A.: Bill Porter on Journeys, Poets and Best-Sellerdom in China ( By IAN JOHNSON ) Malaysia Airlines Plane Leaves Trail of Debris ( By SABRINA TAVERNISE ) Afghanistan Begins Audit of Presidential Election ( By MATTHEW ROSENBERG ) Etc.

Should be in Python

Explanation / Answer

import sys
import requests
from bs4 import BeautifulSoup

request = requests.get('http://feeds.nytimes.com/nyt/rss/World') #getting data from links
soup = BeautifulSoup(request.text)
dataItems = soup.find_all('item')
for dataitem in dataItems:
title = dataitem.find('title').text #printing titles and links
link = dataitem.find('link').text
comments = dataitem.find('comments').text #printing comments
print (title + ' - ' + link + ' - ' + comments) #pritnting all links

#!/usr/bin/python

from xml.dom.minidom import parse
import xml.dom.minidom

# XML Parser
DOMTree = xml.dom.minidom.parse("news.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("RSS"):
print "Root element : %s" % collection.getAttribute("RSS")

# Get all the movies in the collection
news = collection.getElementsByTagName("item")

# Print detail of each movie.
for item in news:
   print "*****News*****"
   if item.hasAttribute("title"):
      print item.getAttribute("title")

creator = item.getElementsByTagName('creator')[0]
print " : %s" % type.childNodes[0].data

Navigate

RSS feeds are a popular way to keep track of news items, blog postings and so on

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

RSS feeds are a popular way to keep track of news items, blog postings and so on

Question

Explanation / Answer

Related Questions

Navigate