`
zhyiwww
  • 浏览: 87466 次
最近访客 更多访客>>
文章分类
社区版块
存档分类
最新评论

Parse XML using Dom4j(转载)

阅读更多

Parsing XML

One of the first things you'll probably want to do is to parse an XML document of some kind. This is easy to do in dom4j. The following code demonstrates how to this.

import java.net.URL;

import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.io.SAXReader;

public class Foo {

    public Document parse(URL url) throws DocumentException {
        SAXReader reader = new SAXReader();
        Document document = reader.read(url);
        return document;
    }
}

Using Iterators

A document can be navigated using a variety of methods that return standard Java Iterators. For example

    public void bar(Document document) throws DocumentException {

        Element root = document.getRootElement();

        // iterate through child elements of root
        for ( Iterator i = root.elementIterator(); i.hasNext(); ) {
            Element element = (Element) i.next();
            // do something
        }

        // iterate through child elements of root with element name "foo"
        for ( Iterator i = root.elementIterator( "foo" ); i.hasNext(); ) {
            Element foo = (Element) i.next();
            // do something
        }

        // iterate through attributes of root 
        for ( Iterator i = root.attributeIterator(); i.hasNext(); ) {
            Attribute attribute = (Attribute) i.next();
            // do something
        }
     }

Powerful Navigation with XPath

In dom4j XPath expressions can be evaluated on the Document or on any Node in the tree (such as Attribute, Element or ProcessingInstruction). This allows complex navigation throughout the document with a single line of code. For example.

    public void bar(Document document) {
        
       //Get the value of the node using the XPath List list = document.selectNodes( "//foo/bar" ); Node node = document.selectSingleNode( "//foo/bar/author" ); String name = node.valueOf( "@name" ); }

For example if you wish to find all the hypertext links in an XHTML document the following code would do the trick.

    public void findLinks(Document document) throws DocumentException {

        List list = document.selectNodes( "//a/@href" );

        for (Iterator iter = list.iterator(); iter.hasNext(); ) {
            Attribute attribute = (Attribute) iter.next();
            String url = attribute.getValue();
        }
    }

If you need any help learning the XPath language we highly recommend the Zvon tutorial which allows you to learn by example.

Fast Looping

If you ever have to walk a large XML document tree then for performance we recommend you use the fast looping method which avoids the cost of creating an Iterator object for each loop. For example

    public void treeWalk(Document document) {
        treeWalk( document.getRootElement() );
    }

    public void treeWalk(Element element) {
        for ( int i = 0, size = element.nodeCount(); i < size; i++ ) {
            Node node = element.node(i);
            if ( node instanceof Element ) {
                treeWalk( (Element) node );
            }
            else {
                // do something....
            }
        }
    }

Creating a new XML document

Often in dom4j you will need to create a new document from scratch. Here's an example of doing that.

import org.dom4j.Document;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;

public class Foo {

    public Document createDocument() {
        Document document = DocumentHelper.createDocument();
        Element root = document.addElement( "root" );

        Element author1 = root.addElement( "author" )
            .addAttribute( "name", "James" )
            .addAttribute( "location", "UK" )
            .addText( "James Strachan" );
        
        Element author2 = root.addElement( "author" )
            .addAttribute( "name", "Bob" )
            .addAttribute( "location", "US" )
            .addText( "Bob McWhirter" );

        return document;
    }
}

Writing a document to a file

A quick and easy way to write a Document (or any Node) to a Writer is via the write() method.

  FileWriter out = new FileWriter( "foo.xml" );
  document.write( out );

If you want to be able to change the format of the output, such as pretty printing or a compact format, or you want to be able to work with Writer objects or OutputStream objects as the destination, then you can use the XMLWriter class.

import org.dom4j.Document;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.XMLWriter;

public class Foo {

    public void write(Document document) throws IOException {

        // lets write to a file
        XMLWriter writer = new XMLWriter(
            new FileWriter( "output.xml" )
        );
        writer.write( document );
        writer.close();


        // Pretty print the document to System.out
        OutputFormat format = OutputFormat.createPrettyPrint();
        writer = new XMLWriter( System.out, format );
        writer.write( document );

        // Compact format to System.out
        format = OutputFormat.createCompactFormat();
        writer = new XMLWriter( System.out, format );
        writer.write( document );
    }
}

Converting to and from Strings

If you have a reference to a Document or any other Node such as an Attribute or Element, you can turn it into the default XML text via the asXML() method.

        Document document = ...;
        String text = document.asXML();

If you have some XML as a String you can parse it back into a Document again using the helper method DocumentHelper.parseText()

        String text = "<person> <name>James</name> </person>";
        Document document = DocumentHelper.parseText(text);

Styling a Document with XSLT

Applying XSLT on a Document is quite straightforward using the JAXP API from Sun. This allows you to work against any XSLT engine such as Xalan or SAXON. Here is an example of using JAXP to create a transformer and then applying it to a Document.

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;

import org.dom4j.Document;
import org.dom4j.io.DocumentResult;
import org.dom4j.io.DocumentSource;

public class Foo {

    public Document styleDocument(
        Document document, 
        String stylesheet
    ) throws Exception {

        // load the transformer using JAXP
        TransformerFactory factory = TransformerFactory.newInstance();
        Transformer transformer = factory.newTransformer( 
            new StreamSource( stylesheet ) 
        );

        // now lets style the given document
        DocumentSource source = new DocumentSource( document );
        DocumentResult result = new DocumentResult();
        transformer.transform( source, result );

        // return the transformed document
        Document transformedDoc = result.getDocument();
        return transformedDoc;
    }
}


zhyiwww 2006-10-24 19:26 发表评论
分享到:
评论

相关推荐

    c#读取XML的几种方法.pdf

    XmlDocument 是一个文档对象模型(DOM),它允许我们编辑和更新 XML 文档,可以随机访问文档中的数据,还可以使用 XPath 查询。使用 XmlDocument,我们可以加载整个 XML 文档到内存中,然后使用 XPath 语法来查询和...

    tinyxml 文档.pdf

    4. **针对常见CPU的手动调优(Hand-tuned C++ with profiling done on several most popular CPUs)**:针对多款流行CPU进行性能优化,确保在不同硬件上的高效运行。 rapidxml的主要头文件包括: - `rapidxml.hpp`...

    详解c#读取XML的实例代码

    XmlDocument类是.NET Framework提供的一个强大的XML处理类,它提供了完整的DOM(Document Object Model)支持,允许开发者在内存中构建和修改XML文档。以下是一个使用XmlDocument读取XML文件的示例: ```csharp ...

    XercesC.pdf

    parser-&gt;parse("sample.xml"); DOMDocument* doc = parser-&gt;getDocument(); DOMElement* root = doc-&gt;.getDocumentElement(); // 遍历并打印根元素下的子元素 DOMNodeList* nodes = root-&gt;getChildNodes(); ...

    详解C#对XML、JSON等格式的解析

    你可以创建一个`XmlDocument`实例,加载XML文件,然后通过DOM(文档对象模型)操作XML结构。例如,创建新节点、设置属性和保存修改后的文档。示例代码展示了如何插入新的XML元素并保存更改。 ```csharp XmlDocument...

    Python Cookbook, 2nd Edition

    Chapter 4. Python Shortcuts Introduction Recipe 4.1. Copying an Object Recipe 4.2. Constructing Lists with List Comprehensions Recipe 4.3. Returning an Element of a List If It Exists Recipe...

    python3.6.5参考手册 chm

    urllib.parse mailbox turtledemo Multi-threading Optimizations Unicode Codecs Documentation IDLE Code Repository Build and C API Changes Porting to Python 3.2 What’s New In Python 3.1 PEP ...

    Packtpub.Python.2.6.Text.Processing.Beginners.Guide.Dec.2010

    Implement callback methods to perform SAX processing and walk in-memory DOM structures Understand Unicode, character encoding, internationalization, and localization Lay out a Mako template-based ...

    An Introduction to Design Patterns in C++ with Qt Second Edition

    * Parse XML data with SAX, DOM, and QXmlStreamReader. * Master today's most valuable creational and structural design patterns. * Create, use, monitor, and debug processes and threads. * Access ...

    AJAX and PHP.pdf

    Chapter 4: Using PHP and MySQL on the Server starts putting the server to work, using PHP to generate dynamic output, and MySQL to manipulate and store the backend data. This chapter shows you how ...

Global site tag (gtag.js) - Google Analytics