Mastering Python (98 Blogs) Become a Certified Professional

We often require to parse data written in different languages. Python Programming provides numerous libraries to parse or split data written in other languages. In this Python XML Parser Tutorial, you will learn how to parse XML using Python.

Here are all the topics that are covered in this tutorial:

What is XML?
Python XML Parsing Modules
xml.etree.ElementTree Module

Using parse() function
Using fromstring() function
Finding Elements of Interest
Modifying XML files
Adding to XML
Deleting from XML

xml.dom.minidom Module

Using parse() function
Using fromString() function
Finding Elements of Interest

So let’s get started. :)

What is XML?

XML stands for Extensible Markup Language. It is similar to HTML in its appearance but, XML is used for data presentation, while HTML is used to define what data is being used. XML is exclusively designed to send and receive data back and forth between clients and servers. Take a look at the following example:

EXAMPLE:

<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<food>
    <item name="breakfast">Idly</item>
    <price>$2.5</price>
    <description>
   Two idly's with chutney
   </description>
    <calories>553</calories>
</food>
<food>
    <item name="breakfast">Paper Dosa</item>
    <price>$2.7</price>
    <description>
    Plain paper dosa with chutney
    </description>
    <calories>700</calories>
</food>
<food>
    <item name="breakfast">Upma</item>
    <price>$3.65</price>
    <description>
    Rava upma with bajji
    </description>
    <calories>600</calories>
</food>
<food>
    <item name="breakfast">Bisi Bele Bath</item>
    <price>$4.50</price>
    <description>
   Bisi Bele Bath with sev
    </description>
    <calories>400</calories>
</food>
<food>
    <item name="breakfast">Kesari Bath</item>
    <price>$1.95</price>
    <description>
    Sweet rava with saffron
    </description>
    <calories>950</calories>
</food>
</metadata>

The above example shows the contents of a file which I have named as ‘Sample.xml’ and I will be using the same in this Python XML parser tutorial for all the upcoming examples.

Python XML Parsing Modules

Python allows parsing these XML documents using two modules namely, the xml.etree.ElementTree module and Minidom (Minimal DOM Implementation). Parsing means to read information from a file and split it into pieces by identifying parts of that particular XML file. Let’s move on further to see how we can use these modules to parse XML data.

xml.etree.ElementTree Module:

This module helps us format XML data in a tree structure which is the most natural representation of hierarchical data. Element type allows storage of hierarchical data structures in memory and has the following properties:

Property	Description
Tag	It is a string representing the type of data being stored
Attributes	Consists of a number of attributes stored as dictionaries
Text String	A text string having information that needs to be displayed
Tail String	Can also have tail strings if necessary
Child Elements	Consists of a number of child elements stored as sequences

ElementTree is a class that wraps the element structure and allows conversion to and from XML. Let us now try to parse the above XML file using python module.

There are two ways to parse the file using ‘ElementTree’ module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.

Using parse() function:

As mentioned earlier, this function takes XML in file format to parse it. Take a look at the following example:

EXAMPLE:

import xml.etree.ElementTree as ET
mytree = ET.parse('sample.xml')
myroot = mytree.getroot()

As you can see, The first thing you will need to do is to import the xml.etree.ElementTree module. Then, the parse() method parses the ‘Sample.xml’ file. The getroot() method returns the root element of ‘Sample.xml’.

When you execute the above code, you will not see outputs returned but there will be no errors indicating that the code has executed successfully. To check for the root element, you can simply use the print statement as follows:

EXAMPLE:

import xml.etree.ElementTree as ET
mytree = ET.parse('sample.xml')
myroot = mytree.getroot()
print(myroot)

OUTPUT: <Element ‘metadata’ at 0x033589F0>

The above output indicates that the root element in our XML document is ‘metadata’.

Using fromstring() function:

You can also use fromstring() function to parse your string data. In case you want to do this, pass your XML as a string within triple quotes as follows:

import xml.etree.ElementTree as ET
data='''<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<food>
    <item name="breakfast">Idly</item>
    <price>$2.5</price>
    <description>
   Two idly's with chutney
   </description>
    <calories>553</calories>
</food>
</metadata>
'''
myroot = ET.fromstring(data)
#print(myroot)
print(myroot.tag)

The above code will return the same output as the previous one. Please note that the XML document used as a string is just one part of ‘Sample.xml’ which I have used for better visibility. You can use the complete XML document as well.

You can also retrieve the root tag by using the ‘tag’ object as follows:

EXAMPLE:

print(myroot.tag)

OUTPUT: metadata

You can also slice the tag string output by just specifying which part of the string you want to see in your output.

EXAMPLE:

print(myroot.tag[0:4])

OUTPUT: meta

As mentioned earlier, tags can have dictionary attributes as well. To check if the root tag has any attributes you can use the ‘attrib’ object as follows:

EXAMPLE:

print(myroot.attrib)

OUTPUT: {}

As you can see, the output is an empty dictionary because our root tag has no attributes.

Find out our Python Training in Top Cities/Countries

India	USA	Other Cities/Countries
Bangalore	New York	UK
Hyderabad	Chicago	London
Delhi	Atlanta	Canada
Chennai	Houston	Toronto
Mumbai	Los Angeles	Australia
Pune	Boston	UAE
Kolkata	Miami	Dubai
Ahmedabad	San Francisco	Philippines

Finding Elements of Interest:

The root consists of child tags as well. To retrieve the child of the root tag, you can use the following:

EXAMPLE:

print(myroot[0].tag)

OUTPUT: food

Now, if you want to retrieve all first-child tags of the root, you can iterate over it using the for loop as follows:

EXAMPLE:

for x in myroot[0]:
     print(x.tag, x.attrib)

OUTPUT:

item {‘name’: ‘breakfast’}
price {}
description {}
calories {}

All the items returned are the child attributes and tags of food.

To separate out the text from XML using ElementTree, you can make use of the text attribute. For example, in case I want to retrieve all the information about the first food item, I should use the following piece of code:

EXAMPLE:

for x in myroot[0]:
        print(x.text)

OUTPUT:

Idly
$2.5
Two idly’s with chutney
553

As you can see, the text information of the first item has been returned as the output. Now if you want to display all the items with their particular price, you can make use of the get() method. This method accesses the element’s attributes.

EXAMPLE:

for x in myroot.findall('food'):
    item =x.find('item').text
    price = x.find('price').text
    print(item, price)

OUTPUT:

Idly $2.5
Paper Dosa $2.7
Upma $3.65
Bisi Bele Bath $4.50
Kesari Bath $1.95

The above output shows all the required items along with the price of each of them. Using ElementTree, you can also modify the XML files.

Modifying XML files:

The elements present your XML file can be manipulated. To do this, you can use the set() function. Let us first take a look at how to add something to XML.

Adding to XML:

The following example shows how you can add something to the description of items.

EXAMPLE:

for description in myroot.iter('description'):
     new_desc = str(description.text)+'wil be served'
     description.text = str(new_desc)
     description.set('updated', 'yes')

mytree.write('new.xml')

The write() function helps create a new xml file and writes the updated output to the same. However, you can modify the original file as well, using the same function. After executing the above code, you will be able to see a new file has been created with the updated results.

set()-Python XML Parsing Tutorial-Edureka

The above image shows the modified description of our food items. To add a new subtag, you can make use of the SubElement() method. For example, if you want to add a new specialty tag to the first item Idly, you can do as follows:

EXAMPLE:

ET.SubElement(myroot[0], 'speciality')
for x in myroot.iter('speciality'):
     new_desc = 'South Indian Special'
     x.text = str(new_desc)

mytree.write('output5.xml')

OUTPUT:

As you can see, a new tag has been added under the first food tag. You can add tags wherever you want by specifying the subscript within [] brackets. Now let us take a look at how to delete items using this module.

Deleting from XML:

To delete attributes or sub-elements using ElementTree, you can make use of the pop() method. This method will remove the desired attribute or element that is not needed by the user.

EXAMPLE:

myroot[0][0].attrib.pop('name', None)

# create a new XML file with the results
mytree.write('output5.xml')

OUTPUT:

The above image shows that the name attribute has been removed from the item tag. To remove the complete tag, you can use the same pop() method as follows:

EXAMPLE:

myroot[0].remove(myroot[0][0])
mytree.write('output6.xml')

OUTPUT:

The output shows that the first subelement of the food tag has been deleted. In case you want to delete all tags, you can make use of the clear() function as follows:

EXAMPLE:

myroot[0].clear()
mytree.write('output7.xml')

OUTPUT:

When the above code is executed, the first child of food tag will be completely deleted including all the subtags. Till here we have been making use of the xml.etree.ElementTree module in this Python XML parser tutorial. Now let us take a look at how to parse XML using Minidom.

xml.dom.minidom Module:

This module is basically used by people who are proficient with DOM (Document Object module). DOM applications often start by parsing XML into DOM. in xml.dom.minidom, this can be achieved in the following ways:

Using the parse() function:

The first method is to make use of the parse() function by supplying the XML file to be parsed as a parameter. For example:

EXAMPLE:

from xml.dom import minidom
p1 = minidom.parse("sample.xml");

Once you execute this, you will be able to split the XML file and fetch the required data. You can also parse an open file using this function.

EXAMPLE:

dat=open('sample.xml')
p2=minidom.parse(dat)

The variable storing the opened file is supplied as a parameter to the parse function in this case.

Using parseString() Method:

This method is used when you want to supply the XML to be parsed as a string.

EXAMPLE:

p3 = minidom.parseString('<myxml>Using<empty/> parseString</myxml>')

You can parse XML using any of the above methods. Now let us try to fetch data using this module.

Finding Elements of Interest:

After my file has been parsed, if I try to print it, the output that is returned displays a message that the variable storing the parsed data is an object of DOM.

EXAMPLE:

dat=minidom.parse('sample.xml')
print(dat)

OUTPUT:

<xml.dom.minidom.Document object at 0x03B5A308>

Accessing Elements using GetElementByTagName:

EXAMPLE:

tagname= dat.getElementsByTagName('item')[0]
print(tagname)

If I try to fetch the first element using the GetElementByTagName method, I will see the following output:

OUTPUT:

Please note that just one output has been returned because I have used [0] subscript for convenience which will be removed in the further examples.

To access the value of the attributes, I will have to make use of the value attribute as follows:

EXAMPLE:

dat = minidom.parse('sample.xml')
tagname= dat.getElementsByTagName('item')
print(tagname[0].attributes['name'].value)

OUTPUT: breakfast

To retrieve the data present in these tags, you can make use of the data attribute as follows:

EXAMPLE:

print(tagname[1].firstChild.data)

OUTPUT: Paper Dosa

You can also split and retrieve the value of the attributes using the value attribute.

EXAMPLE:

print(items[1].attributes['name'].value)

OUTPUT: breakfast

To print out all the items available in our menu, you can loop through the items and return all the items.

EXAMPLE:

for x in items:
    print(x.firstChild.data)

OUTPUT:

Idly
Paper Dosa
Upma
Bisi Bele Bath
Kesari Bath

To calculate the number of items on our menu, you can make use of the len() function as follows:

EXAMPLE:

print(len(items))

OUTPUT: 5

The output specifies that our menu consists of 5 items.

This brings us to the end of this Python XML Parser Tutorial. I hope you have understood everything clearly.

Make sure you practice as much as possible and revert your experience.

Got a question for us? Please mention it in the comments section of this “Python XML Parser Tutorial” blog and we will get back to you as soon as possible. To know more you can enroll with our Master in Python programming course.

To get in-depth knowledge on Python along with its various applications, you can enroll for live Best Python Course Online with 24/7 support and lifetime access.

Upcoming Batches For Data Science with Python Certification Course

Course Name

Date

Data Science with Python Certification Course

Class Starts on 22nd April,2023

22nd April

SAT&SUN (Weekend Batch)

View Details

Data Science with Python Certification Course

Class Starts on 10th June,2023

10th June

SAT&SUN (Weekend Batch)

Data Science

How to Parse and Modify XML in Python?

What is XML?

Python XML Parsing Modules

xml.etree.ElementTree Module:

Using parse() function:

Using fromstring() function:

Finding Elements of Interest:

Modifying XML files:

Adding to XML:

Deleting from XML:

xml.dom.minidom Module:

Using the parse() function:

Using parseString() Method:

Finding Elements of Interest:

Recommended videos for you

Python Loops – While, For and Nested Loops in Python Programming

Linear Regression With R

Data Science : Make Smarter Business Decisions

Application of Clustering in Data Science Using Real-Time Examples

Web Scraping And Analytics With Python

Business Analytics with R

Python for Big Data Analytics

Diversity Of Python Programming

3 Scenarios Where Predictive Analytics is a Must

Business Analytics Decision Tree in R

Python Programming – Learn Python Programming From Scratch

Android Development : Using Android 5.0 Lollipop

The Whys and Hows of Predictive Modeling-II

Introduction to Business Analytics with R

Mastering Python : An Excellent tool for Web Scraping and Data Analysis

Python Classes – Python Programming Tutorial

Python Numpy Tutorial – Arrays In Python

Sentiment Analysis In Retail Domain

The Whys and Hows of Predictive Modelling-I

Know The Science Behind Product Recommendation With R Programming

Recommended blogs for you

Why Python Programming Language Is a Must Have Skill?

Python Tuple With Example: Everything You Need To Know

A Comprehensive Guide To Boosting Machine Learning Algorithms

A Beginner’s guide to “What is R Programming?”

How To Convert Decimal To Binary In Python

How To Avoid Indentation Error In Python

String Trimming in Python: All you Need to Know

Matplotlib Tutorial – Python Matplotlib Library with Examples

How to Find Length of List in Python?

How to find Square Root in Python?

All You Need To Know About Principal Component Analysis (PCA)

Lists In Python: Everything You Need To Know About Python Lists

Data Science Tutorial – Learn Data Science from Scratch!

Introduction to Functions in R

Data Science vs Machine Learning – What’s The Difference?

Confusion Matrix in Machine Learning : Your One Stop Solution

KNN Algorithm: A Practical Implementation Of KNN Algorithm In R

How to Find Prime Numbers in Python

Who uses R?

PHP Error Handling: All You Need To Know

Join the discussion Cancel reply

Trending Courses in Data Science

Data Science and Machine Learning Internship ...

Python Certification Training Course

Python Machine Learning Certification Trainin ...

Data Science with Python Certification Course

Data Analytics with R Programming Certificati ...

Data Science with R Programming Certification ...

SAS Training and Certification

Statistics Essentials for Analytics

Analytics for Retail Banks

Decision Tree Modeling Using R Certification ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

How to Parse and Modify XML in Python?