`
george.gu
  • 浏览: 74047 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

XML Parser:DOM + XPath

阅读更多

There are many kinds of XML Parsers in Java:

  1. DOM (JDK embedded DOM implementation)
  2. SAX
  3. JDOM (It is an alternative to DOM and SAX)
  4. Digester (Jakarta commons Digester)
  5. JAXB(OXM, JDK1.6 embedded JAXB2.0 implementation)
  6. dom4j
  7. Xerces
  8. KXML
  9. ...
In fact, you can list more if google "xmlparser java". However I only list what I known or used in my previous projects. I will talk about them one by one and in this blog I would like to only talk about DOM + XPath.

As usual, why I want to talk about such kind of "OLD" questions for me? The fact is that, I am re-factoring a platform which parsing OMADM DDF DTD with DOM and I cannot answer myself the question:
  • What's the difference between DOM Node and DOM Element?

Node Types

After study some documents which I am 100% sure I had read several years ago, I got the old answer for that:
  • The Node object represents a single node in document tree.
  • There are many types of Node used to represent dedicated architecture of XML document.

NodeType

Description

Children Nodes

Element

Represents an element

Element, Text, Comment,

ProcessingInstruction,

CDATASection, EntityReference

Attr

Represents an attribute

Text, EntityReference

Text

Represents textual content in an element

or attribute

None

CDATASection

Represents a CDATA section in a document

(text that will NOT be parsed by a parser)

None

Document

Represents the entire document

(the root-node of the DOM tree)

Element (max. one),

ProcessingInstruction,

Comment, DocumentType

Node Types table (description and relationship with each other)

NodeType


Named Constant

NodeType

Constant

getNodeName()

return

getNodeValue()

return

Element

ELEMENT_NODE

1

Element name/

tagName

Null

Attr

ATTRIBUTE_NODE

2

Attribute name

Attribute value

Text

TEXT_NODE

3

#text

content of node

CDATASection

CDATA_SECTION_NODE

4

#cdata-section

content of node

Document

DOCUMENT_NODE

9

#document

null

Node Types table (basic properties)
  • Element is a kind of Node or It is sub-class of Node interface in Java point of view. 
  • If a Node has NodeType ==1, we can say it is a Element.
  • Element.getTagName equals to Element.getNodeName().
Here I only list the common used node types in projects. If you want to know more details on other Node types, please refer to w3school.com specification: http://www.w3schools.com/dom/dom_nodetype.asp.

JDK embedded DOM Parser

Normally we can get etire XML Document object by using following java code ():

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(false);
// factory.setSchema (myschema);
DocumentBuilder parser = factory.newDocumentBuilder();
// parser.setEntityResolver (new MyEntityResolver ());
// parser.setErrorHandler (new MyParseErrorHandler ());
org.w3c.dom.Document document = parser.parse(InputStream|String|File|InputSource);

Then you can use org.w3c.dom.Document.getDocumentElement() to get Root element. Why? From Document Node description, we know that Document represent the entire document and its include a unique Element node. So here we can easily get Root Element by calling Document.getDocumentElement().

Once we got the Root element, we can introspect it to parse XML data.

Useful interfaces for DOM parser

Here only list the key methods and that used often, for other methos please refer to javadoc.

 

Node.getNodeType(): short

Get node type (see previous Node Types table). It is very useful for your parser, because different types of Nodes will provide you important data (by dedicated method invocation).

Node.getChildNodes(): NodeList

Get a NodeList that contains all children of this node.

Node.getAttributes(): NamedNodeMap

Get a NamedNodeMap containing the attributes of this node (if it is an Element) or null otherwise.

Node.getNodeName():String

See previous Node Types table.

Node.getNodeValue():String

See previous Node Types table.

Node.getTextContent(): String

Return text content of this node and its descendants.
If current Node is a Root element, getTextContent() will return all the text content inside document.  

Element.getElementsByTagName(String name):NodeList 

Returns a NodeList of all descendant Elements with a given tag name, in document order.

Introspect XML with DOM defined methods

We can parse the XML document now node by node with previous interfaces:

Step 1: Use DocumentBuilder to load XML as Document object;
Step 2: Get Root Element and get its ChildNodes List.
Step 3: loop each Node in ChildNodes (Upper level) to Check Node Type and parse it with your business logical.  if there are child nodes (Lower level) for current Node, pause current Node parsing and loop lower level child nodes until all of them processed and weekup Upper level Node processing.
Step 4: if all of the childs node processed, should reach the end of XML document.

Just draft summarize, will be updated later.

Locate Node with XPath

From previous design, we have to loop a lots of Node if we just want to get a element text like following:
/bookstore/category/country/book/author.
Thanks to XPath,it can help us to locate Element easily by specify element as file path in file system.
For more details on XPath syntax, please refer to w3school.com:http://www.w3schools.com/xpath/default.asp.

You can create a XPath instance as following:
XPathFactory factory = XPathFactory.newInstance(); 
XPath xpath = factory.newXPath(); // Create a new XPath instance

Then You can get a NodeList with "Element tag = author":
XPathExpression xexpr = xpath.compile("/bookstore/category/country/book/author"); 
NodeList nodes = (NodeList) xexpr.evaluate(document, XPathConstants.NODESET);

You can also get a node as following:
Node node = (Node) xexpr.evaluate(document, XPathConstants.NODE);
But if you are not sure if the target Node is unique or not, try to get NodeList instead of unique Node.

Adventage and Disadventage:

As we can see from DOM parsing methods, DOM will load all the document into memory in order to let you loop different nodes easily. So it could be an issue when you design a system which could exchange big XML docuemtn file. In this case some other XML parser, like digester and some other SAX related parsers could be an alternative.

But I always think DOM provide flexible solution to parse XML defintion with a lot of self-reference element, like OMADM DDF node: Node(Node+). Using DOM, we can write our own recursion parser like what I talked in chapter "Introspect XML with DOM defined methods".
Maybe some other parser has better solution with specific path expression that I donot know.  So I will see.
分享到:
评论

相关推荐

    JAVA WEB 开发详解:XML+XSLT+SERVLET+JSP 深入剖析与实例应用.part4

    3.2.4 用dom解析xml文档实例 53 3.3 使用sax解析xml文档 65 3.3.1 sax的处理机制 66 3.3.2 配置sax解析器 69 3.3.3 sax解析器工厂 70 3.3.4 sax的异常类 71 3.3.5 errorhandler接口 73 3.3.6 使用sax解析xml...

    JAVA WEB 开发详解:XML+XSLT+SERVLET+JSP 深入剖析与实例应用.part2

    3.2.4 用dom解析xml文档实例 53 3.3 使用sax解析xml文档 65 3.3.1 sax的处理机制 66 3.3.2 配置sax解析器 69 3.3.3 sax解析器工厂 70 3.3.4 sax的异常类 71 3.3.5 errorhandler接口 73 3.3.6 使用sax解析xml...

    DBMS_XMLDOM DBMS_XMLPARSER DBMS_XMLQUERY 文档

    Oracle数据库系统提供了强大的XML处理能力,这主要体现在其内置的几个PL/SQL包上,如DBMS_XMLDOM、DBMS_XMLPARSER和DBMS_XMLQUERY。这些包为开发者提供了处理XML文档的一整套工具,使得在数据库环境中进行XML数据的...

    JAVA WEB 开发详解:XML+XSLT+SERVLET+JSP 深入剖析与实例应用.part5

    3.2.4 用dom解析xml文档实例 53 3.3 使用sax解析xml文档 65 3.3.1 sax的处理机制 66 3.3.2 配置sax解析器 69 3.3.3 sax解析器工厂 70 3.3.4 sax的异常类 71 3.3.5 errorhandler接口 73 3.3.6 使用sax解析xml...

    JAVA WEB 开发详解:XML+XSLT+SERVLET+JSP 深入剖析与实例应用.part3

    3.2.4 用dom解析xml文档实例 53 3.3 使用sax解析xml文档 65 3.3.1 sax的处理机制 66 3.3.2 配置sax解析器 69 3.3.3 sax解析器工厂 70 3.3.4 sax的异常类 71 3.3.5 errorhandler接口 73 3.3.6 使用sax解析xml...

    xmlParser:xml解析器jax

    XML(eXtensible Markup ...综上所述,"xmlParser:xml解析器jax"涉及的是Java中处理XML文档的JAXP技术,包括DOM、SAX、Transformer和XPath等解析方式。理解并掌握这些知识点对于开发涉及XML处理的Java应用至关重要。

    java解析xml用到的dom4j,jaxen包

    public class XMLParser { public static void main(String[] args) { // 使用DOM4J加载XML Document document = DocumentHelper.parseText("<root><element>Value</element></root>"); // 创建JAXEN的XPath...

    java实现新建文件夹源码-simple-java-xml-parser:具有XPath的易用性和拉式解析性能的JavaXML解析器。专为在A

    总的来说,"simple-java-xml-parser"项目不仅涵盖了在Java中创建新文件夹的基本操作,还涉及了高效的XML解析技术,包括XPath查询和拉式解析,这些都是Java开发中非常重要的技能。通过深入理解这些概念,开发者可以更...

    xml_parser:基于 XML DOM Model 的 XML 解析器

    XMLParser parser = new XMLParser(); // 解析XML字符串或文件 Document document = parser.parseXML(xmlContent); // 使用XPath查找节点 XPath xpath = XPathFactory.newInstance().newXPath(); String ...

    xml parser

    3. **XPath查询**:Dom4j集成了XPath,可以使用XPath表达式方便地查找XML文档中的节点,如`selectNodes(String xpath)`方法。 4. **例子**: ```java Document document = DocumentHelper.parseText(xmlString); ...

    DOM操作XML,XPATH技术

    1. **加载XML文档**:使用`DOMParser`或`ActiveXObject`(IE)解析XML字符串或文件,将其转化为DOM树。 2. **遍历节点**:通过`childNodes`、`firstChild`、`lastChild`、`nextSibling`和`previousSibling`属性来...

    perl 对 xml的详细操作

    XML::XPath 是一个实现了XPath规范的模块,XPath 是一种在XML文档中查找信息的语言。它允许开发者通过路径表达式来选取节点,是处理XML数据的强大工具。 XML::DOM 是一个基于对象的XML处理模型,它遵循W3C的...

    xml.rar_DOM_XML DOM_dom xml_xml

    例如,JavaScript中使用`document.implementation.createDocument()`方法或`DOMParser`对象来解析XML。 4. **访问节点**:使用DOM API,可以通过节点名称、索引或路径(如XPath)来查找特定节点。例如,`...

    XML教程之DOM对象参考手册电子文档格式,非常好的教程。

    6. Xpath和CSS选择器:DOM还支持XPath(XML路径语言)和CSS选择器,它们提供更简洁的方式来定位文档中的节点。XPath允许使用表达式选取节点,而CSS选择器则类似于网页样式表中的选择规则,可以方便地找到具有特定...

    xml-parser:使用JavaScript构建的XML解析器

    `DOMParser`对象是JavaScript中用于解析XML字符串的关键工具。例如: ```javascript var parser = new DOMParser(); var xmlDoc = parser.parseFromString(xmlStr, "text/xml"); ``` 这里,`parseFromString`...

    Sparta -- Lean XML Parser, DOM, & XPath-开源

    Sparta是一个面向Java开发者的开源XML处理库,它的核心特性包括XML解析器、DOM(文档对象模型)实现以及XPath(XML路径语言)解释器。作为一个轻量级的解决方案,Sparta旨在提供高效、简洁且内存占用低的XML处理功能...

    DOMParser jar

    DOMParser是Java中用于解析XML文档的一个重要工具,它基于Document Object Model (DOM) API。DOM是一种树形结构,它代表了XML文档的整体结构,并允许程序以任意顺序访问和修改文档的任何部分。在Java中,`javax.xml....

    dom4j-1.6.1.jar

    - **JAXP(Java API for XML Processing)**:DOM4J全面集成了JAXP,可以使用JAXP的Transformer和Parser接口,实现XML到XML、XML到文本或XML到其他格式的转换。 3. **DOM4J的使用示例** - 解析XML文档: ```java...

    dom4j源代码

    1. **解析器(Parser)**:了解DOM4J如何通过SAX或DOM方式解析XML文档,以及如何处理XML事件。 2. **XPath引擎**:分析XPath表达式的解析和匹配过程,理解其内部的节点遍历算法。 3. **Element和Attribute类**:查看...

    使用DOM4j解析XML文件

    public class XMLParser { public static void main(String[] args) { SAXReader reader = new SAXReader(); try { Document doc = reader.read(new File("sample.xml")); Element root = doc.getRootElement()...

Global site tag (gtag.js) - Google Analytics