`
niwtsew
  • 浏览: 71966 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

SAX Quick Start

阅读更多


refer to http://www.saxproject.org/

Quickstart

This document provides a quick-start tutorial for Java programmers who wish to use SAX2 in their programs.

Requirements

SAX is a common interface implemented for many different XML parsers (and things that pose as XML parsers), just as the JDBC is a common interface implemented for many different relational databases (and things that pose as relational databases). If you want to use SAX, you'll need all of the following:

  • Java 1.1 or higher.
  • A SAX2-compatible XML parser installed on your Java classpath. (If you need such a parser, see the page of links at the left.)
  • The SAX2 distribution installed on your Java classpath. (This probably came with your parser.)

Most Java/XML tools distributions include SAX2 and a parser using it. Most web applications servers use it for their core XML support. In particular, environments with JAXP 1.1 support include SAX2.

Parsing a document

Start by creating a class that extends DefaultHandler :

import org.xml.sax.helpers.DefaultHandler;

public class MySAXApp extends DefaultHandler
{

public MySAXApp ()
{
super();
}

}

Since this is a Java application, we'll create a static main method that uses the the createXMLReader method from the XMLReaderFactory class to choose a SAX driver dynamically. Note the "throws Exception" wimp-out; real applications would need real error handling:

public static void main (String args[])
throws Exception
{
XMLReader xr = XMLReaderFactory.createXMLReader();
}

In case your Java environment did not arrange for a compiled-in default (or to use the META-INF/services/org.xl.sax.driver system resource), you'll probably need to set the org.xml.sax.driver Java system property to the full classname of the SAX driver, as in

java -Dorg.xml.sax.driver=com.example.xml.SAXDriver MySAXApp sample.xml

Several of the SAX2 drivers currently in in widespread use are listed on the "links" page. Class names you might use include:

Class Name Notes
gnu.xml.aelfred2.SAXDriver Lightweight non-validating parser; Free Software
gnu.xml.aelfred2.XmlReader Optionally validates; Free Software
oracle.xml.parser.v2.SAXParser Optionally validates; proprietary
org.apache.crimson.parser.XMLReaderImpl Optionally validates; used in JDK 1.4; Open Source
org.apache.xerces.parsers.SAXParser Optionally validates; Open Source

Alternatively, if you don't mind coupling your application to a specific SAX driver, you can use its constructor directly. We assume that the SAX driver for your XML parser is named com.example.xml.SAXDriver , but this does not really exist. You must know the name of the real driver for your parser to use this approach.

public static void main (String args[])
throws Exception
{
XMLReader xr = new com.example.xml.SAXDriver();
}

We can use this object to parse XML documents, but first, we have to register event handlers that the parser can use for reporting information, using the setContentHandler and setErrorHandler methods from the XMLReader interface. In a real-world application, the handlers will usually be separate objects, but for this simple demo, we've bundled the handlers into the top-level class, so we just have to instantiate the class and register it with the XML reader:

public static void main (String args[])
throws Exception
{
XMLReader xr = XMLReaderFactory.createXMLReader();
MySAXApp handler = new MySAXApp();
xr.setContentHandler(handler);
xr.setErrorHandler(handler);
}

This code creates an instance of MySAXApp to receive XML parsing events, and registers it with the XML reader for regular content events and error events (there are other kinds, but they're rarely used). Now, let's assume that all of the command-line args are file names, and we'll try to parse them one-by-one using the parse method from the XMLReader interface:

public static void main (String args[])
throws Exception
{
XMLReader xr = XMLReaderFactory.createXMLReader();
MySAXApp handler = new MySAXApp();
xr.setContentHandler(handler);
xr.setErrorHandler(handler);

// Parse each file provided on the
// command line.
for (int i = 0; i < args.length; i++) {
FileReader r = new FileReader(args[i]);
xr.parse(new InputSource(r));
}
}

Note that each reader must be wrapped in an InputSource object to be parsed. Here's the whole demo class together (so far):

import java.io.FileReader;

import org.xml.sax.XMLReader;
import org.xml.sax.InputSource;
import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.helpers.DefaultHandler;


public class MySAXApp extends DefaultHandler
{

public static void main (String args[])
throws Exception
{
XMLReader xr = XMLReaderFactory.createXMLReader();
MySAXApp handler = new MySAXApp();
xr.setContentHandler(handler);
xr.setErrorHandler(handler);

// Parse each file provided on the
// command line.
for (int i = 0; i < args.length; i++) {
FileReader r = new FileReader(args[i]);
xr.parse(new InputSource(r));
}
}


public MySAXApp ()
{
super();
}
}

You can compile this code and run it (make sure you specify the SAX driver class in the org.xml.sax.driver property), but nothing much will happen unless the document contains malformed XML, because you have not yet set up your application to handle SAX events.

Handling events

Things get interesting when you start implementing methods to respond to XML parsing events (remember that we registered our class to receive XML parsing events in the previous section). The most important events are the start and end of the document, the start and end of elements, and character data.

To find out about the start and end of the document, the client application implements the startDocument and endDocument methods:

public void startDocument ()
{
System.out.println("Start document");
}

public void endDocument ()
{
System.out.println("End document");
}

The start/endDocument event handlers take no arguments. When the SAX driver finds the beginning of the document, it will invoke the startDocument method once; when it finds the end, it will invoke the endDocument method once (even if there have been errors).

These examples simply print a message to standard output, but your application can contain any arbitrary code in these handlers: most commonly, the code will build some kind of an in-memory tree, produce output, populate a database, or extract information from the XML stream.

The SAX driver will signal the start and end of elements in much the same way, except that it will also pass some parameters to the startElement and endElement methods:

public void startElement (String uri, String name,
String qName, Attributes atts)
{
if ("".equals (uri))
System.out.println("Start element: " + qName);
else
System.out.println("Start element: {" + uri + "}" + name);
}

public void endElement (String uri, String name, String qName)
{
if ("".equals (uri))
System.out.println("End element: " + qName);
else
System.out.println("End element: {" + uri + "}" + name);
}

These methods print a message every time an element starts or ends, with any Namespace URI in braces before the element's local name. The qName contains the raw XML 1.0 name, which you must use for all elements that don't have a namespace URI. In this quick introduction, we won't look at how attributes are accessed; you can access them by name, or by iterating through them much as if they were a vector.

Finally, SAX2 reports regular character data through the characters method; the following implementation will print all character data to the screen; it is a little longer because it pretty-prints the output by escaping special characters:

public void characters (char ch[], int start, int length)
{
System.out.print("Characters: \"");
for (int i = start; i < start + length; i++) {
switch (ch[i]) {
case '\\':
System.out.print("\\\\");
break;
case '"':
System.out.print("\\\"");
break;
case '\n':
System.out.print("\\n");
break;
case '\r':
System.out.print("\\r");
break;
case '\t':
System.out.print("\\t");
break;
default:
System.out.print(ch[i]);
break;
}
}
System.out.print("\"\n");
}

Note that a SAX driver is free to chunk the character data any way it wants, so you cannot count on all of the character data content of an element arriving in a single characters event.

Sample SAX2 application

Here is the complete sample application (again, in a serious app the event handlers would probably be implemented in a separate class):

import java.io.FileReader;

import org.xml.sax.XMLReader;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.helpers.DefaultHandler;


public class MySAXApp extends DefaultHandler
{

public static void main (String args[])
throws Exception
{
XMLReader xr = XMLReaderFactory.createXMLReader();
MySAXApp handler = new MySAXApp();
xr.setContentHandler(handler);
xr.setErrorHandler(handler);

// Parse each file provided on the
// command line.
for (int i = 0; i < args.length; i++) {
FileReader r = new FileReader(args[i]);
xr.parse(new InputSource(r));
}
}


public MySAXApp ()
{
super();
}


////////////////////////////////////////////////////////////////////
// Event handlers.
////////////////////////////////////////////////////////////////////


public void startDocument ()
{
System.out.println("Start document");
}


public void endDocument ()
{
System.out.println("End document");
}


public void startElement (String uri, String name,
String qName, Attributes atts)
{
if ("".equals (uri))
System.out.println("Start element: " + qName);
else
System.out.println("Start element: {" + uri + "}" + name);
}


public void endElement (String uri, String name, String qName)
{
if ("".equals (uri))
System.out.println("End element: " + qName);
else
System.out.println("End element: {" + uri + "}" + name);
}


public void characters (char ch[], int start, int length)
{
System.out.print("Characters: \"");
for (int i = start; i < start + length; i++) {
switch (ch[i]) {
case '\\':
System.out.print("\\\\");
break;
case '"':
System.out.print("\\\"");
break;
case '\n':
System.out.print("\\n");
break;
case '\r':
System.out.print("\\r");
break;
case '\t':
System.out.print("\\t");
break;
default:
System.out.print(ch[i]);
break;
}
}
System.out.print("\"\n");
}

}

Sample Output

Consider the following XML document:

<?xml version="1.0"?>

<poem xmlns="http://www.megginson.com/ns/exp/poetry">
<title>Roses are Red</title>
<l>Roses are red,</l>
<l>Violets are blue;</l>
<l>Sugar is sweet,</l>
<l>And I love you.</l>
</poem>

If this document is named roses.xml and there is a SAX2 driver on your classpath named com.example.xml.SAXDriver (this driver does not actually exist), you can invoke the sample application like this:

java -Dorg.xml.sax.driver=com.example.xml.SAXDriver MySAXApp roses.xml

When you run this, you'll get output something like this:

Start document
Start element: {http://www.megginson.com/ns/exp/poetry}poem
Characters: "\n"
Start element: {http://www.megginson.com/ns/exp/poetry}title
Characters: "Roses are Red"
End element: {http://www.megginson.com/ns/exp/poetry}title
Characters: "\n"
Start element: {http://www.megginson.com/ns/exp/poetry}l
Characters: "Roses are red,"
End element: {http://www.megginson.com/ns/exp/poetry}l
Characters: "\n"
Start element: {http://www.megginson.com/ns/exp/poetry}l
Characters: "Violets are blue;"
End element: {http://www.megginson.com/ns/exp/poetry}l
Characters: "\n"
Start element: {http://www.megginson.com/ns/exp/poetry}l
Characters: "Sugar is sweet,"
End element: {http://www.megginson.com/ns/exp/poetry}l
Characters: "\n"
Start element: {http://www.megginson.com/ns/exp/poetry}l
Characters: "And I love you."
End element: {http://www.megginson.com/ns/exp/poetry}l
Characters: "\n"
End element: {http://www.megginson.com/ns/exp/poetry}poem
End document

Note that even this short document generates (at least) 25 events: one for the start and end of each of the six elements used (or, if you prefer, one for each start tag and one for each end tag), one of each of the eleven chunks of character data (including whitespace between elements), one for the start of the document, and one for the end.

If the input document did not include the xmlns="http://www.megginson.com/ns/exp/poetry" attribute to declare that all the elements are in that namespace, the output would instead be like:

Start document
Start element: poem
Characters: "\n"
Start element: title
Characters: "Roses are Red"
End element: title
Characters: "\n"
Start element: l
Characters: "Roses are red,"
End element: l
Characters: "\n"
Start element: l
Characters: "Violets are blue;"
End element: l
Characters: "\n"
Start element: l
Characters: "Sugar is sweet,"
End element: l
Characters: "\n"
Start element: l
Characters: "And I love you."
End element: l
Characters: "\n"
End element: poem
End document

You will most likely work with both types of documents: ones using XML namespaces, and ones not using them. You may also work with documents that have some elements (and attributes) with namespaces, and some without. Make sure that your code actually tests for namespace URIs, rather than assuming they are always present (or always missing).

 

分享到:
评论

相关推荐

    SAX的jar包 SAX的jar包

    SAX的jar包 SAX的jar包SAX的jar包 SAX的jar包 SAX的jar包

    sax.jar sax.jar

    sax.jar sax.jar sax.jar sax.jar sax.jar sax.jar sax.jar

    SAX解析网络编程

    public void characters(char[] ch, int start, int length) throws SAXException { // 处理字符内容 } public static void main(String[] args) { try { SAXParserFactory factory = SAXParserFactory.new...

    SAX 实例教程及代码

    SAX(Simple API for XML)是一种轻量级的解析XML数据的方法,相比于DOM解析,它更节省内存,更适合处理大型XML文件。SAX是事件驱动的解析器,它读取XML文档并触发一系列的事件,如开始文档、开始元素、字符数据等,...

    sax9.0解析XML

    1. **事件驱动模型**:这是SAX的核心概念,它定义了一系列的回调方法,如startDocument、endDocument、startElement、endElement等,开发者可以重写这些方法以实现自定义的处理逻辑。 2. **解析器配置**:SAX9.0...

    SAX类解析XML

    public void characters(char[] ch, int start, int length) throws SAXException { // 处理文本内容 } } public class Main { public static void main(String[] args) { try { SAXParserFactory factory = ...

    sax解析xml文件

    SAX(Simple API for XML)是一种轻量级的XML解析技术,主要用于读取XML文档。相较于DOM(Document Object Model)解析器,SAX解析器占用更少的内存和处理时间,因为它不会一次性加载整个XML文档到内存,而是采用...

    java中的sax解析方案

    1. `startDocument()`:文档开始。 2. `startElement()`:元素开始,接收元素名和属性。 3. `characters()`:处理元素内的文本内容。 4. `endElement()`:元素结束。 5. `endDocument()`:文档结束。 六、异常处理 ...

    Android之SAX解析XML

    本文将深入探讨如何在Android环境中使用SAX(Simple API for XML)解析XML文件。SAX是一种事件驱动的解析器,它在读取XML文档时触发一系列事件,开发者可以注册事件处理器来处理这些事件,从而实现对XML数据的高效、...

    Sax解析XML文件解析

    1. **startDocument()**:开始解析文档时触发。 2. **startElement()**:遇到开始标签时触发,提供标签名和属性信息。 3. **endElement()**:遇到结束标签时触发。 4. **characters()**:遇到文本内容时触发,可能被...

    SAX解析XML源码

    public void characters(char[] ch, int start, int length) { // 处理元素内的文本 } } ``` 接下来,我们创建XMLReader并设置ContentHandler,然后开始解析XML: ```java SAXParserFactory factory = ...

    SAX解析.pdf

    SAX解析是一种流式的XML解析方式,它使用了一种基于事件的模型来处理XML文档。开发者不需要在内存中构建整个文档的树形结构,而是通过响应XML解析器发出的事件来处理XML文档。这种方式特别适合处理大型的XML文件,...

    SAX处理是如何工作的

    ### SAX处理是如何工作的 #### 一、SAX简介 SAX (Simple API for XML) 是一种基于事件驱动的XML解析方式。与DOM (Document Object Model) 解析方式不同,SAX 不会在内存中构建整个XML文档的树状结构,而是随着解析...

    SAX解析XML文件实例

    SAX解析XML文件的实例。一个项目同时用dom解析和sax解析xml文件貌似会报错,项目框架建一直是用sax和dom4j解析xml文件的。当我用dom解析xml文件。导入包后就报错识别不了xml文件的编码格式。于是做了一个sax解析xml...

    dom解析和sax解析

    DOM(Document Object Model)解析和SAX(Simple API for XML)解析是两种常见的XML文档解析方式,它们在处理XML数据时有不同的策略和优缺点。 DOM解析是一种基于树型结构的XML解析方法。当一个XML文档被DOM解析器...

    数据sax解析

    public void characters(char[] ch, int start, int length) throws SAXException { // 处理元素文本 } public static void main(String[] args) throws Exception { XMLReader reader = XMLReaderFactory....

    xml sax解析

    SAX(Simple API for XML)是XML解析器的一种,它采用事件驱动的方式处理XML文档,相较于DOM(Document Object Model)解析器,SAX解析更加轻量级和高效。 SAX解析的核心思想是读取XML文档时,每当遇到一个元素、...

    test_sax.rar_XML SAX vs20_sax_sax xml_读取XML文档

    本文将重点讨论SAX解析器,并通过描述中的"test_sax.rar"压缩包中的示例,讲解如何使用SAX解析器在Visual Studio 20(VS20)环境下读取XML文档。 SAX解析器是一种基于事件驱动的解析方式,它不会一次性加载整个XML...

    SAX解析超大XML文件 示例代码

    为了解决这个问题,我们可以采用流式解析的方式,比如SAX(Simple API for XML)。SAX解析器逐行读取XML文件,只在需要时处理数据,显著降低了内存需求。 SAX解析的核心在于事件驱动模型。当解析器读取XML文件时,...

    android 以SAX方式解析xml

    public void characters(char[] ch, int start, int length) { // 处理元素内容 } } ``` **步骤3:创建SAX解析器** 使用`SAXParserFactory`创建`SAXParser`对象,并设置我们的ContentHandler。 ```java ...

Global site tag (gtag.js) - Google Analytics