`

How To Parse XML File Using XPath In Java

阅读更多

最近一直在学习XML的Xpath解析方式,据说是一个很简单的遍历XML文件的工具,类似于SQL和Oracle的关系,但是找了很多都没有找到关于Java的Xpath代码,有的都是把W3School上的文档拷贝过来的,自己也尝试过去用Java去实现遍历,但是发现有的解释不理解,直到看到了这边外国人写的博客,让我瞬间明白了,真的感谢这位哥们。。。

下面是他的原文,我测试过几个列子,都是OK了,大家都懂英文,我就没有必要再翻译过来了,呵呵。

 

XPath is a language for finding information in an XML file. You can say that XPath is (sort of) SQL for XML files. XPath is used to navigate through elements and attributes in an XML document. You can also use XPath to traverse through an XML file in Java.

XPath comes with powerful expressions that can be used to parse an xml document and retrieve relevant information.

For demo, let us consider an xml file that holds information of employees.

 
<?xml version="1.0"?>
<Employees>
    <Employee emplid="1111" type="admin">
        <firstname>John</firstname>
        <lastname>Watson</lastname>
        <age>30</age>
        <email>johnwatson@sh.com</email>
    </Employee>
    <Employee emplid="2222" type="admin">
        <firstname>Sherlock</firstname>
        <lastname>Homes</lastname>
        <age>32</age>
        <email>sherlock@sh.com</email>
    </Employee>
    <Employee emplid="3333" type="user">
        <firstname>Jim</firstname>
        <lastname>Moriarty</lastname>
        <age>52</age>
        <email>jim@sh.com</email>
    </Employee>
    <Employee emplid="4444" type="user">
        <firstname>Mycroft</firstname>
        <lastname>Holmes</lastname>
        <age>41</age>
        <email>mycroft@sh.com</email>
    </Employee>
</Employees>

 

I have saved this file at path C:\employees.xml. We will use this xml file in our demo and will try to fetch useful information using XPath. Before we start lets check few facts from above xml file.

  1. There are 4 employees in our xml file
  2. Each employee has a unique employee id defined by attribute emplid
  3. Each employee also has an attribute type which defines whether an employee is admin or user.
  4. Each employee has four child nodes: firstnamelastnameage and email
  5. Age is a number

Let’s get started…

1. Learning Java DOM Parsing API

In order to understand XPath, first we need to understand basics of DOM parsing in Java. Java provides powerful implementation of domparser in form of below API.

1.1 Creating a Java DOM XML Parser

First, we need to create a document builder using DocumentBuilderFactory class. Just follow the code. It’s pretty much self explainatory.

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
//...
 
DocumentBuilderFactory builderFactory =
        DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
    builder = builderFactory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
    e.printStackTrace();  
}

 

 

1.2 Parsing XML with a Java DOM Parser

Once we have a document builder object. We uses it to parse XML file and create a document object.

import org.w3c.dom.Document;
import java.io.IOException;
import org.xml.sax.SAXException;
//...
 
try {
    Document document = builder.parse(
            new FileInputStream("c:\\employees.xml"));
} catch (SAXException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

 

 

In above code, we are parsing an XML file from filesystem. Sometimes you might want to parse XML specified as String value instead of reading it from file. Below code comes handy to parse XML specified as String.

String xml = ...;
Document xmlDocument = builder.parse(new ByteArrayInputStream(xml.getBytes()));

 

 

1.3 Creating an XPath object

Once we have document object. We are ready to use XPath. Just create an xpath object using XPathFactory.

import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
//...
 
XPath xPath =  XPathFactory.newInstance().newXPath();

 

 

1.4 Using XPath to parse the XML

Use xpath object to complie an XPath expression and evaluate it on document. In below code we read email address of employee having employee id = 3333. Also we have specified APIs to read an XML node and a nodelist.

String expression = "/Employees/Employee[@emplid='3333']/email";
 
//read a string value
String email = xPath.compile(expression).evaluate(xmlDocument);
 
//read an xml node using xpath
Node node = (Node) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODE);
 
//read a nodelist using xpath
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);

 

 

2. Learning XPath Expressions

As mentioned above, XPath uses a path expression to select nodes or list of node from an xml document. Heres a list of useful paths and expression that can be used to select any node/nodelist from an xml document.

Expression Description
nodename Selects all nodes with the name “nodename”
/ Selects from the root node
// Selects nodes in the document from the current node that match the selection no matter where they are
. Selects the current node
.. Selects the parent of the current node
@ Selects attributes
employee Selects all nodes with the name “employee”
employees/employee Selects all employee elements that are children of employees
//employee Selects all book elements no matter where they are in the document

Below list of expressions are called Predicates. The Predicates are defined in square brackets [ ... ]. They are used to find a specific node or a node that contains a specific value.

Path Expression Result
/employees/employee[1] Selects the first employee element that is the child of the employees element.
/employees/employee[last()] Selects the last employee element that is the child of the employees element
/employees/employee[last()-1] Selects the last but one employee element that is the child of the employees element
//employee[@type='admin'] Selects all the employee elements that have an attribute named type with a value of ‘admin’

There are other useful expressions that you can use to query the data.

Read this w3school page for more details: http://www.w3schools.com/xpath/xpath_syntax.asp

3. Examples: Query XML document using XPath

Below are few examples of using different expressions of xpath to fetch some information from xml document.

3.1 Read firstname of all employees

Below expression will read firstname of all the employees.

String expression = "/Employees/Employee/firstname";
System.out.println(expression);
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
}

 

 

Output:

John
Sherlock
Jim
Mycroft

3.2 Read a specific employee using employee id

Below expression will read employee information for employee with emplid = 2222. Check how we used API to retrieve node information and then traveresed this node to print xml tag and its value.

String expression = "/Employees/Employee[@emplid='2222']";
System.out.println(expression);
Node node = (Node) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODE);
if(null != node) {
    nodeList = node.getChildNodes();
    for (int i = 0;null!=nodeList && i < nodeList.getLength(); i++) {
        Node nod = nodeList.item(i);
        if(nod.getNodeType() == Node.ELEMENT_NODE)
            System.out.println(nodeList.item(i).getNodeName() + " : " + nod.getFirstChild().getNodeValue()); 
    }
}

 

 

Output:

firstname : Sherlock
lastname : Homes
age : 32
email : sherlock@sh.com

3.3 Read firstname of all employees who are admin

This is again a predicate example to read firstname of all employee who are admin (defined by type=admin).

String expression = "/Employees/Employee[@type='admin']/firstname";
System.out.println(expression);
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
}

 

 

Output:

John
Sherlock

3.4 Read firstname of all employees who are older than 40 year

See how we used predicate to filter employees who has age > 40.

String expression = "/Employees/Employee[age>40]/firstname";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
}

 

 

Output:

Jim
Mycroft

3.5 Read firstname of first two employees (defined in xml file)

Within predicates, you can use position() to identify the position of xml element. Here we are filtering first two employees using position().

String expression = "/Employees/Employee[position() <= 2]/firstname";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
}

 

 

Output:

John
Sherlock

4. Complete Java source code

In order to execute this source, just create a basic Java project in your IDE or just save below code in Main.java and execute. It will need employees.xml file as input. Copy the employee xml defined in start of this tutorial at c:\\employees.xml.

 
package net.viralpatel.java;
 
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
 
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
 
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
 
public class Main {
    public static void main(String[] args) {
 
        try {
            FileInputStream file = new FileInputStream(new File("c:/employees.xml"));
                 
            DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
             
            DocumentBuilder builder =  builderFactory.newDocumentBuilder();
             
            Document xmlDocument = builder.parse(file);
 
            XPath xPath =  XPathFactory.newInstance().newXPath();
 
            System.out.println("*************************");
            String expression = "/Employees/Employee[@emplid='3333']/email";
            System.out.println(expression);
            String email = xPath.compile(expression).evaluate(xmlDocument);
            System.out.println(email);
 
            System.out.println("*************************");
            expression = "/Employees/Employee/firstname";
            System.out.println(expression);
            NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
            for (int i = 0; i < nodeList.getLength(); i++) {
                System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
            }
 
            System.out.println("*************************");
            expression = "/Employees/Employee[@type='admin']/firstname";
            System.out.println(expression);
            nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
            for (int i = 0; i < nodeList.getLength(); i++) {
                System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
            }
 
            System.out.println("*************************");
            expression = "/Employees/Employee[@emplid='2222']";
            System.out.println(expression);
            Node node = (Node) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODE);
            if(null != node) {
                nodeList = node.getChildNodes();
                for (int i = 0;null!=nodeList && i < nodeList.getLength(); i++) {
                    Node nod = nodeList.item(i);
                    if(nod.getNodeType() == Node.ELEMENT_NODE)
                        System.out.println(nodeList.item(i).getNodeName() + " : " + nod.getFirstChild().getNodeValue()); 
                }
            }
             
            System.out.println("*************************");
 
            expression = "/Employees/Employee[age>40]/firstname";
            nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
            System.out.println(expression);
            for (int i = 0; i < nodeList.getLength(); i++) {
                System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
            }
         
            System.out.println("*************************");
            expression = "/Employees/Employee[1]/firstname";
            System.out.println(expression);
            nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
            for (int i = 0; i < nodeList.getLength(); i++) {
                System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
            }
            System.out.println("*************************");
            expression = "/Employees/Employee[position() <= 2]/firstname";
            System.out.println(expression);
            nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
            for (int i = 0; i < nodeList.getLength(); i++) {
                System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
            }
 
            System.out.println("*************************");
            expression = "/Employees/Employee[last()]/firstname";
            System.out.println(expression);
            nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
            for (int i = 0; i < nodeList.getLength(); i++) {
                System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
            }
 
            System.out.println("*************************");
 
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (XPathExpressionException e) {
            e.printStackTrace();
        }       
    }
}

 

That’s all folks :)

 

针对上述例子,我又想到了一种情况,发现他里面没有涉及到,于是我就尝试啊,也看了Xpath的源代码,发现还是要通过挨个遍历的方式去找,一个笨办法给大家演示下,谁如有好的方法可以给我说下,大家共享下呗。。。

 

我的测试代码:

package com.fit.test01;

import java.io.IOException;
import java.io.InputStream;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class XPathEmployee {

	public static void main(String[] args) {

		DocumentBuilderFactory builderFactory = DocumentBuilderFactory
				.newInstance();

		DocumentBuilder docbuilder;
		InputStream is = null;

		try {

			is = XPathEmployee.class.getClassLoader().getResourceAsStream(
					"employees.xml");
			// 一种是获取当前文件的路径,一种是获取当前文件的流,两种方式都可以,并且文件应该在当前工程的src目录下
			// String strFilePath =
			// XPathEmployee.class.getClassLoader().getResource("employees.xml").toString();

			docbuilder = builderFactory.newDocumentBuilder();
			Document xmlDocument = docbuilder.parse(is);

			XPath xPath = XPathFactory.newInstance().newXPath();

			Node node = null;

			NodeList nodeList = null;

			String expression = "/Employees/Employee[@type='admin']";

			nodeList = (NodeList) xPath.compile(expression).evaluate(
					xmlDocument, XPathConstants.NODESET);

			for (int i = 0; i < nodeList.getLength(); i++) {
				node = nodeList.item(i);

				if (node.getNodeType() == Node.ELEMENT_NODE) {
					System.out.println(node.getNodeName() + " : "
							+ node.getFirstChild().getNodeValue());
//这个时候才到employee层,所以需要向下再延伸。。。
					if (node.hasChildNodes()) {
						System.out.println("----------------");

						NodeList nodeList1 = node.getChildNodes();

						for (int j = 0; j < nodeList1.getLength(); j++) {
							Node node1 = nodeList1.item(j);

							if (node1.getNodeType() == Node.ELEMENT_NODE) {
								System.out.println(node1.getNodeName() + " : "
										+ node1.getFirstChild().getNodeValue());
							}
						}
					}

				}

			}

		} catch (SAXException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		} catch (ParserConfigurationException e1) {
			e1.printStackTrace();
		} catch (XPathExpressionException e) {
			e.printStackTrace();
		} finally {
			if (is != null) {
				try {
					is.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
	}
}

 

运行结果如下(主要想测试,如果有多个Node的情况,他的例子中都是精确到了属性,结果都是唯一的一个,那如果有多个呢?见上述方法):

Employee : 
		
----------------
firstname : John
lastname : Watson
age : 30
email : johnwatson@sh.com
Employee : 
		
----------------
firstname : Sherlock
lastname : Homes
age : 32
email : sherlock@sh.com

 

分享到:
评论

相关推荐

    Java中使用xpath获取xml中的数据

    在Java编程环境中,XPath是一种强大的查询语言,用于在XML文档中查找信息。它允许我们根据节点的名称、属性、值或其他特性来定位XML文档中的特定部分。本篇将深入探讨如何在Java中利用XPath来提取XML文档中的数据,...

    Jboss启动报Failed to parse WEB-INFweb.xml; - nested throwable错误

    Jboss启动报Failed to parse WEB-INF/web.xml; - nested throwable错误解决方案 在Jboss应用服务器中,启动报错Failed to parse WEB-INF/web.xml; - nested throwable是一种常见的错误,本文将对此错误进行深入分析...

    parseXML.java

    遍历xml的所有节点

    基于Xpath的xml文件查询和更新

    Xpath(XML Path Language)则是用来在XML文档中查找信息的语言,它允许我们通过路径表达式来选取节点,如元素、属性、文本等。在本主题中,我们将深入探讨基于Xpath的XML文件查询和更新。 **Xpath基本概念** 1. *...

    txt parse java file

    txt parse java file

    ParseXML.java

    XML转化操作工具类

    parseXML.js

    Nodejs: Using for Parsing the XML file from the program

    ParseXML实例

    "ParseXML实例"就是这样一个过程,它涉及到如何通过编程语言来解析XML文档,提取其中的数据。 XML的结构主要由元素(Element)、属性(Attribute)、文本内容(Text Content)、注释(Comment)、处理指令...

    pare xml file by using QXmlStreamReader class

    this is example on how to parse files of xml by using QXmlStreamReader.compared with above ,we using recusion method .you can extend it on base this according to your need

    Java版本的XPath方式解析jar和源代码

    Document doc = builder.parse("your_xml_file.xml"); ``` 2. **创建XPath对象**:接着,使用`XPathFactory`创建一个XPath实例。 ```java XPathFactory xpathFactory = XPathFactory.newInstance(); XPath ...

    Qt QXmlStreamReader using

    this is a example on how to parse xml file by using QXmlStreamReader.the xml file is simple it is helpful to study QT

    java xpath demo

    Document doc = factory.newDocumentBuilder().parse("path_to_your_xml_file.xml"); // 创建XPath对象 XPathFactory xpathFactory = XPathFactory.newInstance(); XPath xpath = xpathFactory.newXPath(); /...

    用java读取xml文件的四种方法

    Document doc = builder.parse(new File("path_to_xml_file")); ``` 2. **SAX解析器(Simple API for XML)** SAX解析器以事件驱动的方式处理XML文件,只在需要时读取数据,节省内存。它适合处理大型XML文件。...

    java 解析xml 多级

    Document doc = builder.parse(xmlFile); doc.getDocumentElement().normalize(); ``` 4. 遍历XML结构: `Document`对象的根元素可以通过`getDocumentElement()`获得,然后可以使用递归或其他遍历方法访问所有子...

    用java读取修改xml文件的代码实现

    Java提供`javax.xml.xpath`包来支持XPath表达式的编译和执行,可以用来定位和修改XML节点。 8. **XML Schema**: Java还提供了处理XML Schema(XSD)的能力,可以验证XML文档是否符合特定的模式。`javax.xml....

    JAVA JAXB 解析XML嵌套子节点为字符串

    File xmlFile = new File("path_to_xml_file.xml"); Root root = (Root) unmarshaller.unmarshal(xmlFile); // Access the nested text as a string String nestedText = root.getNested().getText(); ``` 在这个...

    xml.rar_XPath查询_xpath

    标题“xml.rar_XPath查询_xpath”表明这是一个关于使用Java执行XPath查询的压缩包资源。"hooker"可能指的是一个特殊的类或者方法,用于拦截和检查XPath查询的过程,这在调试或者安全审计时非常有用。 描述中的...

    Dom4j 解析Xml文档及 XPath查询 学习笔记

    总结,Dom4j是Java中解析XML的强大工具,结合XPath,能够高效地查找、处理XML文档中的数据。通过熟练掌握这两个技术,可以轻松地处理复杂的XML操作。同时,持续查阅官方文档,有助于深入理解和应用Dom4j的各种功能。

    XML.rar_XML java_java call_java xml_xml 读写_xml读写 java

    XML(eXtensible Markup Language)是一种用于标记数据的语言,广泛应用在各种软件开发中,特别是在Java平台上。XML的设计目标是传输和存储数据,而非显示数据,因此它与HTML(HyperText Markup Language)有着本质...

    Dom,Sax,Xpath解析XML实例

    在处理XML时,我们通常会使用三种主要的解析方式:DOM(Document Object Model)、SAX(Simple API for XML)和XPath(XML Path Language)。下面将详细介绍这三种解析方法,并结合实例进行讲解。 1. DOM解析: DOM...

Global site tag (gtag.js) - Google Analytics