I have a .xml file below, I want to parse it in pyspark (Spark using python) so
ID: 3730115 • Letter: I
Question
I have a .xml file below, I want to parse it in pyspark (Spark using python) so that I can count the number of Id in this file. For example, the one below will output number of id = 3 after counting. I need the parser so that I can output a file that contain all Id content. For example, the output file will be:
7
8
9
Can someone help me please?
<?xml version="1.0" encoding="utf-8"?>
<posthistory>
<row Id="7" PostHistoryTypeId="2" PostId="5" RevisionGUID="009bca93-fce2-44ed-a277-a8452650a627" CreationDate="2014-05-13T23:58:30.457" UserId="5" Text="I've always been interested in machine learning, but I can't figure out one thing about starting out with a simple "Hello World" example - how can I avoid hard-coding behavior?

For example, if I wanted to "teach" a bot how to avoid randomly placed obstacles, I couldn't just use relative motion, because the obstacles move around, but I don't want to hard code, say, distance, because that ruins the whole point of machine learning.

Obviously, randomly generating code would be impractical, so how could I do this?" />
<row Id="8" PostHistoryTypeId="1" PostId="5" RevisionGUID="009bca93-fce2-44ed-a277-a8452650a627" CreationDate="2014-05-13T23:58:30.457" UserId="5" Text="How can I do simple machine learning without hard-coding behavior?" />
<row Id="9" PostHistoryTypeId="3" PostId="5" RevisionGUID="009bca93-fce2-44ed-a277-a8452650a627" CreationDate="2014-05-13T23:58:30.457" UserId="5" Text="<machine-learning>" />
Explanation / Answer
Method One
---------------
We can import this data by reading from a file:
print(id) //Finally printing id
Method Two
--------------
#Python code to illustrate parsing of XML files
# importing the required modules
import csv
import requests
import xml.etree.ElementTree as ET
def parseXML(xmlfile):
# create element tree object
tree = ET.parse(xmlfile)
# get root element
root = tree.getroot()
# create empty list for news items
newsitems = []
# iterate news items
for item in root.findall('row'):
# empty news dictionary
news = {}
# iterate child elements of item
for child in item:
news['Id'] = child .find('id').text
# append news dictionary to news items list
newsitems.append(news)
# return news items list
return newsitems
def main():
# parse xml file
newsitems = parseXML('topnewsfeed.xml')
if __name__ == "__main__":
# calling main function
main()
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.