I have a .xml file below, I want to parse it in pyspark (Spark using python) so

ID: 3730115 • Letter: I

Question

I have a .xml file below, I want to parse it in pyspark (Spark using python) so that I can count the number of Id in this file. For example, the one below will output number of id = 3 after counting. I need the parser so that I can output a file that contain all Id content. For example, the output file will be:

Can someone help me please?

<?xml version="1.0" encoding="utf-8"?>
<posthistory>
<row Id="7" PostHistoryTypeId="2" PostId="5" RevisionGUID="009bca93-fce2-44ed-a277-a8452650a627" CreationDate="2014-05-13T23:58:30.457" UserId="5" Text="I've always been interested in machine learning, but I can't figure out one thing about starting out with a simple "Hello World" example - how can I avoid hard-coding behavior?

For example, if I wanted to "teach" a bot how to avoid randomly placed obstacles, I couldn't just use relative motion, because the obstacles move around, but I don't want to hard code, say, distance, because that ruins the whole point of machine learning.

Obviously, randomly generating code would be impractical, so how could I do this?" />
<row Id="8" PostHistoryTypeId="1" PostId="5" RevisionGUID="009bca93-fce2-44ed-a277-a8452650a627" CreationDate="2014-05-13T23:58:30.457" UserId="5" Text="How can I do simple machine learning without hard-coding behavior?" />
<row Id="9" PostHistoryTypeId="3" PostId="5" RevisionGUID="009bca93-fce2-44ed-a277-a8452650a627" CreationDate="2014-05-13T23:58:30.457" UserId="5" Text="<machine-learning>" />

Explanation / Answer

Method One

---------------

We can import this data by reading from a file:

print(id) //Finally printing id

Method Two

--------------

#Python code to illustrate parsing of XML files

# importing the required modules

import csv

import requests

import xml.etree.ElementTree as ET

def parseXML(xmlfile):

# create element tree object

tree = ET.parse(xmlfile)

# get root element

root = tree.getroot()

# create empty list for news items

newsitems = []

# iterate news items

for item in root.findall('row'):

# empty news dictionary

news = {}

# iterate child elements of item

for child in item:

news['Id'] = child .find('id').text

# append news dictionary to news items list

newsitems.append(news)

# return news items list

return newsitems

def main():

# parse xml file

newsitems = parseXML('topnewsfeed.xml')

if __name__ == "__main__":

# calling main function

main()

Navigate

I have a .txt file with 10 test scores. For example: 10 1 2 3 4 5 6 7 8 9 10, wh

I have a 1 TB external hard disk and a 512 GB internal hard disk. I am looking f

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

I have a .xml file below, I want to parse it in pyspark (Spark using python) so

Question

Explanation / Answer

Related Questions

Navigate