python 读写XML

zc0604

浏览: 227748 次
性别:
来自: 北京

最近访客更多访客>>

xyc1985414

abraxas101

l1012384516

puquanbuai

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

python

<一 python读XML文件> 转自http://hi.baidu.com/heelenyc/blog/item/4062fd0b57c75294d1581b09.html

Python处理XML

使用python开发时，由于python的开源生态圈非常的给力，对于实现同一个功能，往往在这方面的类库非常多，而开发者也同样面临着如何选择最佳的类库作为辅助开发的工具。本文将记录本人在使用python处理xml格式数据时测试过的类库，有些类库由于先天不足，无法支持一些特性，涉及的类库或模块有xml(python自带)、libxml2 、lxml 、xpath 。

附注：本文处理xml格式的数据的结构如下：

Python代码

input_xml_string = """

<root>

<item>

</item>

<other>

</other>

</root>

"""

python自带的xml处理模块

可以使用该模块提供的”getElementsByTagName“接口找到需要的节点,实例“get_tagname”如下：

Python代码

import xml.dom.minidom

def get_tagname():

doc = xml.dom.minidom.parseString(input_xml_string)

for node in doc.getElementsByTagName("data"):

print (node, node.tagName, node.getAttribute("version"))

程序运行结果如下：

Python代码

(<DOM Element: data at 0x89884cc>, u'data', u'1.0')

(<DOM Element: data at 0x898860c>, u'data', u'2.0')

(<DOM Element: data at 0x89887cc>, u'data', u'1.0')

(<DOM Element: data at 0x898890c>, u'data', u'2.0')

观察上面的运行结果，”getElementsByTagName“接口查找名为data的所有节点，有时候，程序需要完成的功能是只需要某个节点下面的 data节点，如other节点下的data节点。也许您马上想到了，我们可以判断data节点的父节点是否为other来满足功能，实例 “get_tagname_other”如下：

Python代码

import xml.dom.minidom

def get_tagname_other():

doc = xml.dom.minidom.parseString(input_xml_string)

for node in doc.getElementsByTagName("data"):

if node.parentNode.tagName == "other":

print (node, node.tagName, node.getAttribute("version"))

程序运行结果如下：

Python代码

(<DOM Element: data at 0x936b7cc>, u'data', u'1.0')

(<DOM Element: data at 0x936b90c>, u'data', u'2.0')

观察上面的运行结果，恩，很好，问题是解决了，但是如果我想查找other节点下的data节点且属性节点version等于1.0的那个data节点，那么就需要添加更多的策略来筛选出我们需要的数据，显然这种方式不够灵活，因此我们想到了使用xpath的方式去搜索我们需要的节点。实例 “get_xpath”如下：

Python代码

import xml.etree.ElementTree

from StringIO import StringIO

file = StringIO(input_xml_string)

def get_xpath():

doc = xml.etree.ElementTree.parse(file)

for node in doc.findall("//item/data"):

print (node, node.tag, (node.items()))

程序运行结果如下：

Python代码

(<Element data at 90c4dcc>, 'data', [('url', 'http://***'), ('version', '1.0')])

(<Element data at 90c4e8c>, 'data', [('url', 'http://***'), ('version', '2.0')])

观察上面的运行结果，使用xpath的方式显然改善了程序的可读性，可依然没有解决上面的问题，这是由于python自带的xml模块对xpath方式的支持先天不足，如果想同时满足可读性与功能的正确性，我们需要使用针对python的第三方xml处理类库。

libxml2

libxml2是使用C语言开发的xml解析器，是一个基于MIT License的免费开源软件，多种编程语言都有基于它的实现，如本文将会介绍的lxml模块。实例“get_xpath_1”如下：

Python代码

mport libxml2

def get_xpath_1():

doc = libxml2.parseFile("data.xml")#data.xml文件结构与上述的input_xml_string相同

for node in doc.xpathEval("//item/data[@version = '1.0']"):

print (node, node.name, (node.properties.name, node.properties.content))

doc.freeDoc()

程序运行结果如下：

Python代码

(<xmlNode (data) object at 0x9326c6c>, 'data', ('version', '1.0'))

观察上面的运行结果，能够满足我们的需求，有点小不足“xpathEval()”接口不支持类似模板的用法，但不影响使用，由于libxml2采用C语言开发的，因此在使用API接口的方式上难免会有点“水土不服”(写法或习惯性用法)

lxml

lxml是以上述介绍过的libxml2为基础采用python语言开发的，从使用层面上说比libxml2更适合python开发者(鄙人感受)，且"xpath"接口支持类似模板的用法，实例“get_xpath_2”如下：

Python代码

import lxml.etree

def get_xpath_2():

doc = lxml.etree.parse(file)

for node in doc.xpath("//item/data[@version = $name]", name = "1.0"):

print (node, node.tag, (node.items()))

程序运行结果如下：

Python代码

(<Element data at a1f784c>, 'data', [('version', '1.0'), ('url', 'http://***')])

xpath

xpath是python官方推荐的一个支持xpath等处理的模块，是基于本文介绍过的python自带xml处理模块扩展而成，可以很好的结合使用，同时“find”接口也支持类似模板的用法，实例“get_xpath_3”如下：

Python代码

import xpath

def get_xpath_3():

doc = xml.dom.minidom.parseString(input_xml_string)

for node in xpath.find("//item/data[@version = $name]", doc, name = "1.0"):

print (node, node.tagName, node.getAttribute("version"))

程序运行结果如下：

Python代码

(<DOM Element: data at 0x89934cc>, u'data', u'1.0')

总结

通过对这些类库的实践，我们已经了解了python在处理xml格式的数据时有各种各样的选择，并得知这些类库各自擅长那些方面的处理和各种类库的使用手法，可以根据实际需求选择合适的类库完成开发工作。

<二 python写XML文件> 转自 http://lulinbest.blog.sohu.com/75921823.html

以前用Python中的minidom写过生成XML文件的程序,现在需要读取XML文件中的内容了，首先想到的还是minidom模块.一番编写测试后,如愿掌握了其函数的使用方式,和AJAX中的DOM操作没什么区别.

以前就知道elementtree在处理XML文件时广受Python程序员的欢迎,也安装过elementtree的安装包,现在使用的Python2.5中已将其收录了.既然我要处理XML文件，当然也要学着使用更高效和易用的模块了.自己摸索了半天,除了有关名字空间的函数没有试用外,其它函数都试用过了.以后处理XML文件可以得心应手了。

下面是一个简单的例子,通过它可以知道各个函数的使用方法：

from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import Element
from xml.etree.ElementTree import SubElement
from xml.etree.ElementTree import dump
from xml.etree.ElementTree import Comment
from xml.etree.ElementTree import tostring

'''
<?xml version="1.0"?>
<PurchaseOrder>
  <account refnum="2390094"/>
  <item sku="33-993933" qty="4">
    <name>Potato Smasher</name>
    <description>Smash Potatoes like never before.</description>
  </item>
</PurchaseOrder>
'''

## Writing the content to xml document
book = ElementTree()

purchaseorder = Element('PurchaseOrder')
book._setroot(purchaseorder)

SubElement(purchaseorder,  'account', {'refnum' : "2390094"})

item = Element("item", {'sku' : '33-993933', 'qty' : '4'})
purchaseorder.append(item)
print item.items()       # [('sku', '33-993933'), ('qty', '4')]
print item.attrib        # {'sku': '33-993933', 'qty': '4'}
print item.get('sku')    # 33-993933
SubElement(item, 'name').text = "Potato Smasher"
SubElement(item, 'description').text = "Smash Potatoes like never before."

#book.write('book.xml',"utf-8")

#print tostring(purchaseorder)

#import sys
#book.write(sys.stdout)

#dump(book)

## Displaying the content of the xml document
print purchaseorder.find('account')
print purchaseorder.find('account').get('refnum')
print purchaseorder.findall('account')[0].get('refnum')

print purchaseorder.find('item/name')
print purchaseorder.find('item/name').text

## How to use ElementTree([element,] [file])
## 1. From standard XML element, it becomes root element
print ElementTree(item).getroot().find('name').text
## 2. From XML file
print ElementTree(file='book.xml').getroot().find('item/description').text


## Create an iterator
for element in purchaseorder.getiterator():
    print element.tag


## Get pretty look
def indent(elem, level=0):
    i = "\n" + level*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        for e in elem:
            indent(e, level+1)
        if not e.tail or not e.tail.strip():
            e.tail = i
    if level and (not elem.tail or not elem.tail.strip()):
        elem.tail = i
    return elem

if __name__=="__main__":
    dump(indent(purchaseorder))
    book.write('book.xml',"utf-8")

分享到：

2012-05-09 21:10
浏览 25749
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论