It seems like everyone needs to parse XML these days. They’re either saving their own information in XML or loading in someone else’s data. This is why I was glad to learn that as of Python 2.5, the ElementTree XML package has been added to the standard library in the XML module. What I like about the ElementTree module is that it just seems to make sense. This might seem like a strange thing to say about an XML module, but I’ve had to parse enough XML in my time to know that if an XML module makes sense the first time you use it, it’s probably a keeper. The ElementTree module allows me to work with XML data in a way that is similar to how I think about XML data. A subset of the full ElementTree module is available in the Python 2.5 standard library as In general, the ElementTree module treats XML data as a list of lists. All XML has a root element that will have zero or more subelements (or child elements). Each of those subelements may in turn have subelements of their own. The best way to think about this is with a brief example. First let’s take a look at some sample XML data: Here we have a root element with two child elements. Each child element has some text associated with it seen here as “one” and “two”. If we examine the XML as a hierarchical list of lists we see that we have one element “root” in our root list. Within the “root” element we have a list containing two subelements “child” and “child”. The two “child” elements would then contain empty lists representing their lack of subelements. Not too complicated so far, is it? Now let’s use the ElementTree package to parse this XML and print the text data associated with each child element. To start, we’ll create a Python file with the contents shown in Listing 1. This is basically a template that I use for many of my simple “*.py” files. It doesn’t actually do anything except set up the script so that when the file is run, the The first thing that we need to do in our Python code is import the ElementTreemodule: Note: If you are not using Python 2.5 and have installed the ElementTree module on your own, you should import the ElementTree module as follows: This will import the ElementTree section of the module into your program aliased as ET. However, you don’t have to import ElementTree using an alias; you can simply import it and access it as Now we’ll begin writing code in the The Be careful here! The Thankfully, the Element object is an iterator object so we can use a This will give us all the child elements in the root element. As mentioned earlier, each element in the XML tree is represented as an Element object, so as we iterate through the root element’s child elements we are getting Element objects with which to work. Meaning that each loop though the for loop will give us the next child element in the form of an Element object until there are no more children left. In order to print out the text associated with an Element object we simply have to access the Element object’s To recap, have a look at the code in Listing 2. Once you run the code you should get the following output: If an XML element does not have any text associated with it, like our root element, the Element object’s Let’s alter the XML that we are working with to add attributes to the elements and look at how we would parse that information. If the XML uses attributes in addition to, or instead of, inner text they can be accessed using the Element object’s When you run the code you get the following output: These are the attributes for each child element stored in a dictionary. Being able to work with an XML element’s attributes as a Python dictionary is a great feature and fits well with the dynamic nature of XML attributes. Now that we’ve tried our hand at reading XML, let’s try creating some. If you understand the reading process, you should have no trouble understanding the creation process because it works in much the same manner. What we are going to do in this example is recreate the XML data that we were working with above. The first step is to create our After this code is executed, the variable The next step is to create the two child elements. There are two ways to do this. In the first method, if you know exactly what you are creating, it’s easiest to use the This will create a The second approach to creating a child element is to create an Element object separately (rather than a sub element) and append it to a parent Element object. The results are exactly the same – this is simply a different approach that may come in handy when creating your XML,or working with two sets of XML data. First we create an Element object in the same way that we created the root element: This creates the Pretty simple! Now, if we want to look at the contents of our To recap, have a look at the code in Listing 3. When you run this code you will get the following output: If you want to create the XML with attributes (as illustrated in the second reading example), you can use the Element object’s You may also set attributes when you create Element objects: Most of the time you won’t be working with XML data that you explicitly create in your code, instead you will usually read the XML data in from a data source, work with it, and then save it back out when you are done. Fortunately, configuringElementTree to work with different data sources is very easy. For example, let’s take the XML data that we first used and save it into a file named There are a few methods that we can use to load XML data from a file. We are going to use the The first thing that we need in order to load the XML data is determine the full path to the We then call the However, since we only want the directory name (not the full path and filename of our Python source file) we have to strip off the filename: Now that we have the directory in which the we will use the os module to join the two paths so that the resulting path is always correct regardless of what operating system our code is executed on: Note: If you have any trouble understanding what any of the code used to determine the path of We now have the full path to the We now have an ElementTree object instance that represents our XML file. Since we are working with files, we should watch out for incorrect paths, I/O errors, or the parse function failing for any other reason. If you wish to be extra careful, you can wrap the parse function in a try/except block in order to catch any exceptions that may be thrown: In the except block, I catch the Exception base class so that I catch any and all exceptions that may be thrown (in the case of a missing file it will most likely be an Now that we know how to read in XML data, we should look at how one writes XML data out to a file. Let’s assume that after reading in the Notice that in order to add a child to the root element we used the ElementTree object’s Now that we have a third child element, let’s write the XML data back out to That’s it! If we want to be really careful when writing the XML data out to a file, we’ll watch out for exceptions. However most of the time the To recap you can find all of the code from this section in Listing 4. When you run the code and take a look at the Introduction
xml.etree
, but you don’t have to use Python 2.5 in order to use theElementTree module. If you are still using an older version of Python (1.5.2 or later) you can simply download the module from its website and manually install it on your system. The website also has very easy to follow installation instructions, which you should consult to avoid issues while installing ElementTree.<root>
<child>One</child>
<child>Two</child>
</root>
Reading XML data
#!/usr/bin/env python
def main():
pass
if __name__ == "__main__":
main()
main
method will be executed. Some people like to use the Python interactive interpreter for simple hacking like this. Personally, I prefer having my code stored in a handy file so I can make simple changes and re-run the entire script when I am just playing around.from xml.etree import ElementTree as ET
from elementtree import ElementTree as ET
ElementTree
. Using ET is demonstrated in the Python 2.5 “What’s new” documentation[1] and I think it’s a great way to eliminate some key strokes.main
method. The first step is to load the XML data described above. Normally you will be working with a file or URL; for now we want to keep this simple and load the XML data directly from the text:element = ET.XML(
"<root><child>One</child><child>Two</child></root>")
XML
function is described in the ElementTree documentation as follows: “Parses an XML document from a string constant. This function can be used to embed “XML literals” in Python code”[2].XML
function returns an Element object, and not an ElementTree object as one might expect. Element objects are used to represent XML elements, whereas the ElementTree object is used to represent the entire XML document. Element objects may represent the entire XML document if they are the root element but will not if they are a subelement. ElementTree objects also add “some extra support for serialization to and from standard XML.”[3] The Element object that is returned represents the element in our XML data.
for
loop to loop through all of its child elements:for subelement in element:
text
attribute:for subelement in element:
print subelement.text
#!/usr/bin/env python
from xml.etree import ElementTree as ET
def main():
element = ET.XML("<root><child>One</child><child>Two</child></root>")
for subelement in element:
print subelement.text
if __name__ == "__main__":
# Someone is launching this directly
main()
One
Two
text
attribute will be set to None
. If you want to check if an element had any text associated with it, you can do the following:if element.text is not None:
print element.text
Reading XML Attributes
attrib
attribute. The attrib
attribute is a Python dictionary and is relatively easy to use:def main():
element = ET.XML(
'<root><child val="One"/><child val="Two"/></root>')
for subelement in element:
print subelement.attrib
{'val': 'One'}
{'val': 'Two'}
Writing XML
element:
#create the root <root>
root_element = ET.Element("root")
root_element
is an Element object, just like the Element objects that we used earlier to parse the XML.SubElement
method, which creates an Element object that is a subelement (or child) of another Element object:#create the first child <child>One</child>
child = ET.SubElement(root_element, "child")
Element that is a child of
root_element
. We then need to set the text associated with that element. To do this we use the same text attribute that we used in the first parsing example. However, instead of simply reading the text attribute we set its value:child.text = "One"
#create the second child <child>Two</child>
child = ET.Element("child")
child.text = "Two"
child
Element object and sets its text to “Two”. We then append it to the root element:#now append
root_element.append(child)
root_element
(or any other Element object for that matter) we can use the handy tostring
function. It does exactly what it says that it does: it converts an Element object into a human readable string.#Let's see the results
print ET.tostring(root_element)
#!/usr/bin/env python
from xml.etree import ElementTree as ET
def main():
#create the root </root><root>
root_element = ET.Element("root")
#create the first child <child>One</child>
child = ET.SubElement(root_element, "child")
child.text = "One"
#create the second child <child>Two</child>
child = ET.Element("child")
child.text = "Two"
#now append
root_element.append(child)
#Let's see the results
print ET.tostring(root_element)
if __name__ == "__main__":
# Someone is launching this directly
main()
</root><root><child>One</child><child>Two</child></root>
Writing XML Attributes
set
method. To add the val
attribute to the first element, use the following:child.set("val","One")
child = ET.Element("child", val="One")
Reading XML Files
our.xml
in the same location as our Python file.parse
function. This function is nice because it will accept, as a parameter, the path to a file OR a “file-like” object. The term “file-like” is used on purpose because the object does not have to be a file object per se – it simply has to be an object that behaves in a file-like manner. A “file-like” object is an object that implements a “file-like” interface, meaning that it shares many (if not all) methods with the file object. If an object is “file-like” this fact will usually be prominently mentioned in its documentation.our.xml
file. In order to calculate this, we determine the full path of our Python source file, strip the filename from it, and then append our.xml
to the path. This is rather simple given that the __file__
attribute (available in Python 2.2 and later) is the relative path and filename of our Python source file. Although the __file__
attribute will be a relative path, we can use it to calculate the absolute path using the standard os module:import os
abspath
function to get the absolute path:xml_file = os.path.abspath(__file__)
xml_file = os.path.dirname(xml_file)
our.xml
file resides, all we have to do is append the our.xml
filename to the xml_file
variable. However, instead of just doing something like:xml_file += "/our.xml"
xml_file = os.path.join(xml_file, "our.xml")
our.xml
is doing, try printing out xml_file
after each of the above lines and it should become clear.our.xml
file. In order to load its XML data we simply pass the path to the parse
function:tree = ET.parse(xml_file)
try:
tree = ET.parse("sar")
except Exception, inst:
print "Unexpected error opening %s: %s" % (xml_file, inst)
return
IOError
exception).Writing XML Data to a File
out.xml
fiie we want to add another item to the XML file that we just read in:child = ET.SubElement(tree.getroot(), "child")
child.text = "Three"
getroot
function. The getroot
function simply returns the root Element object of the XML data.our.xml
. Thanks to ElementTree this is a painless experience:tree.write(xml_file)
write
method will succeed without throwing an exception; it is more important to be sure that the path used is correct. Often times, instead of getting the exception that you want, you end up with an XML file stored in some far off and strange location on your hard drive because your path was incorrect or you did not specify the full path. But, as is often the case when programming, better safe than sorry:try:
tree.write(xml_file)
except Exception, inst:
print "Unexpected error writing to file %s: %s" % (xml_file, inst)
return
#!/usr/bin/env python
from xml.etree import ElementTree as ET
import os
def main():
xml_file = os.path.abspath(__file__)
xml_file = os.path.dirname(xml_file)
xml_file = os.path.join(xml_file, "our.xml")
try:
tree = ET.parse(xml_file)
except Exception, inst:
print "Unexpected error opening %s: %s" % (xml_file, inst)
return
child = ET.SubElement(tree.getroot(), "child")
child.text = "Three"
try:
tree.write(xml_file)
except Exception, inst:
print "Unexpected error writing to file %s: %s" % (xml_file, inst)
return
if __name__ == "__main__":
# Someone is launching this directly
main()
our.xml
file you should see that the the third child element has been added:<root>
<child>One</child>
<child>Two</child>
<child>Three</child>
</root>
相关推荐
在Python编程中,处理XML和Excel文件是常见的任务,尤其在数据处理和分析领域。XML(eXtensible Markup Language)是一种结构化...通过学习和实践,你将能够熟练地使用Python处理XML和Excel文件,从而提高工作效率。
本篇文章将深入探讨如何利用Python有效地处理XML文件,基于提供的"PythonXML.pdf"文档,我们将覆盖以下几个关键知识点: 1. **XML基础知识**:XML的设计目标是传输和存储数据,而非显示数据。XML文档由元素、属性、...
在深入学习Python时,你会接触到许多核心概念和实用技术,其中包括XML和HTML的处理。本篇文章将详细探讨这些主题。 首先,Python的“内置数据类型”是理解其强大功能的基础。这些类型包括整数(int)、浮点数(float)...
在`read_xml.py`中,我们将学习如何读取和解析XML文件。以下是一个基本的读取示例: ```python import xml.etree.ElementTree as ET # 解析XML文件 tree = ET.parse("input.xml") root = tree.getroot() # 遍历...
通过学习这些代码,你可以理解如何根据实际需求处理XML数据。实践中,可能涉及更复杂的操作,如处理命名空间、处理事件驱动的解析(SAX解析),或者使用第三方库如lxml来提升性能。但基本的XML解析和操作都可以通过`...
学习这个脚本可以帮助你更好地理解如何在实际项目中应用Python处理XML文件。 总结来说,Python的`xml.etree.ElementTree`模块提供了一套强大的工具,使得XML文件的读写、修改和查询变得简单易行。无论你是新手还是...
总结,这个压缩包提供的工具展示了Python在数据处理和文件操作上的强大能力,涉及了Python基础、文件I/O、XML解析、MAT文件操作等多个重要知识点,这些都是在进行数据分析、机器学习或计算机视觉项目时经常遇到的...
XML-RPC(XML Remote Procedure Call)是一种基于XML的...这是一份很好的学习资料,对于理解XML-RPC工作原理和Python中的实现非常有帮助。通过研究这些代码,你可以深入理解网络通信、数据序列化和远程调用等核心概念。
在深入探讨Python机器学习之前,首先要明白Python是一种高级编程语言,因其简洁明了的语法和强大的库支持而备受青睐,特别是在数据分析和机器学习领域。Python不仅适合初学者,也适用于专业人士,它能够处理从简单的...
Python之XML编程是关于使用Python语言处理XML文档的专题,XML(eXtensible Markup Language)是一种用于存储和传输数据的标记语言,广泛应用于数据交换、配置文件、以及Web服务等场景。Python作为一门功能强大的脚本...
总的来说,这个项目提供了一个实践性的示例,展示了Python如何优雅地处理XML数据,这对于任何希望在Python中进行数据处理或集成不同系统的人来说都是宝贵的学习资源。通过研究这个项目,你可以提升你的XML解析技能,...
`xml.etree.ElementTree`库提供了这些功能,只需适当学习和理解即可灵活运用。 通过参考提供的教程链接(https://blog.csdn.net/tanghong1996/article/details/88657307),你可以找到更多实例和详细解释,进一步...
Python是一种强大的编程语言,尤其在处理数据和自动化任务方面,如批量修改XML文件。XML(eXtensible Markup Language)是一种用于标记数据的通用格式,...这个压缩包提供的资源为学习和实践XML处理提供了很好的起点。
在 cette 小节中,我们将学习如何使用 ElementTree 模块处理 XML 文档。 ElementTree 模块的主要功能是将 XML 文档解析成树形结构,使得开发者可以方便地访问和操作 XML 元素。ElementTree 模块提供了多种方法来...
XML(eXtensible Markup ...通过深入学习XML,你可以掌握数据表示的灵活性,更好地进行数据交换和程序设计。这个资料包“XML初学进阶”可能包含基础教程、实战案例和相关工具的使用指南,是你提升XML技能的好助手。
在Python编程领域,XML(eXtensible Markup Language)是一种常用的数据交换格式,常用于存储和传输结构化数据。为了方便地处理XML文档,出现了许多库,其中之一就是"Requests-XML"。这个库将流行的HTTP客户端...
### lxml Python XML 解析器知识点概述 #### 一、lxml简介 - **定义与功能**:`lxml`是一种使用Python编写的高效且灵活的XML解析器与处理库。它能够快速处理XML文档,并提供了丰富的功能来满足各种需求。 - **支持...