`

StAX API

阅读更多

StAX API

The StAX API exposes methods for iterative, event-based processing of XML documents. XML documents are treated as a filtered series of events, and infoset states can be stored in a procedural fashion. Moreover, unlike SAX, the StAX API is bidirectional, enabling both reading and writing of XML documents.

The StAX API is really two distinct API sets: a  cursor   API and an  iterator   API. These two API sets explained in greater detail later in this chapter, but their main features are briefly described below.

Cursor API

As the name implies, the StAX  cursor   API represents a cursor with which you can walk an XML document from beginning to end. This cursor can point to one thing at a time, and always moves forward, never backward, usually one infoset element at a time.

The two main cursor interfaces are  XMLStreamReader   and  XMLStreamWriter .  XMLStreamReader   includes accessor methods for all possible information retrievable from the XML Information model, including document encoding, element names, attributes, namespaces, text nodes, start tags, comments, processing instructions, document boundaries, and so forth; for example:

public interface XMLStreamReader {
  public int next() throws XMLStreamException;
  public boolean hasNext() throws XMLStreamException;
  public String getText();
  public String getLocalName();
  public String getNamespaceURI();
  // ... other methods not shown
} 

You can call methods on  XMLStreamReader , such as  getText   and  getName , to get data at the current cursor location.  XMLStreamWriter   provides methods that correspond to  StartElement   and  EndElement event types; for example:

public interface XMLStreamWriter {
  public void writeStartElement(String localName) \
    throws XMLStreamException;
  public void writeEndElement() \
    throws XMLStreamException;
  public void writeCharacters(String text) \
    throws   XMLStreamException;
// ... other methods not shown
} 

The cursor API mirrors SAX in many ways. For example, methods are available for directly accessing string and character information, and integer indexes can be used to access attribute and namespace information. As with SAX, the cursor API methods return XML information as strings, which minimizes object allocation requirements.

Iterator API

The StAX  iterator   API represents an XML document stream as a set of discrete event objects. These events are pulled by the application and provided by the parser in the order in which they are read in the source XML document.

The base iterator interface is called  XMLEvent , and there are subinterfaces for each event type listed in  Table 3-2 , below. The primary parser interface for reading iterator events is  XMLEventReader , and the primary interface for writing iterator events is  XMLEventWriter . The  XMLEventReader   interface contains five methods, the most important of which is  nextEvent() , which returns the next event in an XML stream.XMLEventReader   implements  java.util.Iterator , which means that returns from  XMLEventReader   can be cached or passed into routines that can work with the standard Java Iterator; for example:

public interface XMLEventReader extends Iterator {
  public XMLEvent nextEvent() throws XMLStreamException;
  public boolean hasNext();
  public XMLEvent peek() throws XMLStreamException;
  ...
} 

Similarly, on the output side of the iterator API, you have:

public interface XMLEventWriter {
  public void flush() throws XMLStreamException;
  public void close() throws XMLStreamException;
  public void add(XMLEvent e) throws XMLStreamException;
  public void add(Attribute attribute) \
    throws XMLStreamException;
  ...
} 

Iterator Event Types

Table 3-2   lists the thirteen  XMLEvent   types defined in the event iterator API.

Table 3-2 XMLEvent Types 
Event Type
Description
StartDocument
Reports the beginning of a set of XML events, including encoding, XML version, and standalone properties.
StartElement
Reports the start of an element, including any attributes and namespace declarations; also provides access to the prefix, namespace URI, and local name of the start tag.
EndElement
Reports the end tag of an element. Namespaces that have gone out of scope can be recalled here if they have been explicitly set on their corresponding  StartElement .
Characters
Corresponds to XML  CData   sections and  CharacterData   entities. Note that ignorable whitespace and significant whitespace are also reported as  Character   events.
EntityReference
Character entities can be reported as discrete events, which an application developer can then choose to resolve or pass through unresolved. By default, entities are resolved. Alternatively, if you do not want to report the entity as an event, replacement text can be substituted and reported as  Characters .
ProcessingInstruction
Reports the target and data for an underlying processing instruction.
Comment
Returns the text of a comment
EndDocument
Reports the end of a set of XML events.
DTD
Reports as  java.lang.String   information about the DTD, if any, associated with the stream, and provides a method for returning custom objects found in the DTD.
Attribute
Attributes are generally reported as part of a  StartElement   event. However, there are times when it is desirable to return an attribute as a standalone  Attribute   event; for example, when a namespace is returned as the result of an  XQuery   or  XPath   expression.
Namespace
As with attributes, namespaces are usually reported as part of a  StartElement , but there are times when it is desirable to report a namespace as a discrete  Namespace   event.

Note that the  DTD ,  EntityDeclaration ,  EntityReference ,  NotationDeclaration , and  ProcessingInstruction   events are only created if the document being processed contains a DTD.

Sample Event Mapping

As an example of how the event iterator API maps an XML stream, consider the following XML document:

<?xml version="1.0"?>
<BookCatalogue xmlns="http://www.publishing.org">
  <Book>
    <Title>Yogasana Vijnana: the Science of Yoga</Title>
    <ISBN>81-40-34319-4</ISBN>
    <Cost currency="INR">11.50</Cost>
  </Book>
</BookCatalogue> 

This document would be parsed into eighteen primary and secondary events, as shown below. Note that secondary events, shown in curly braces ({} ), are typically accessed from a primary event rather than directly.

Table 3-3 Sample Iterator API Event Mapping 
#
Element/Attribute
Event
1
version="1.0"
StartDocument
2
isCData = false
data = "\n"
IsWhiteSpace = true
Characters
3
qname = BookCatalogue:http://www.publishing.org
attributes = null
namespaces = {BookCatalogue" -> http://www.publishing.org"}
StartElement
4
qname = Book
attributes = null
namespaces = null
StartElement
5
qname = Title
attributes = null
namespaces = null
StartElement
6
isCData = false
data = "Yogasana Vijnana: the Science of Yoga\n\t"
IsWhiteSpace = false
Characters
7
qname = Title
namespaces = null
EndElement
8
qname = ISBN
attributes = null
namespaces = null
StartElement
9
isCData = false
data = "81-40-34319-4\n\t"
IsWhiteSpace = false
Characters
10
qname = ISBN
namespaces = null
EndElement
11
qname = Cost
attributes = {"currency" -> INR}
namespaces = null
StartElement
12
isCData = false
data = "11.50\n\t"
IsWhiteSpace = false
Characters
13
qname = Cost
namespaces = null
EndElement
14
isCData = false
data = "\n"
IsWhiteSpace = true
Characters
15
qname = Book
namespaces = null
EndElement
16
isCData = false
data = "\n"
IsWhiteSpace = true
Characters
17
qname = BookCatalogue:http://www.publishing.org
namespaces = {BookCatalogue" -> http://www.publishing.org"}
EndElement
18
 
EndDocument

There are several important things to note in the above example:

  • The events are created in the order in which the corresponding XML elements are encountered in the document, including nesting of elements, opening and closing of elements, attribute order, document start and document end, and so forth.
  • As with proper XML syntax, all container elements have corresponding start and end events; for example, every  StartElement   has a corresponding  EndElement , even for empty elements.
  • Attribute   events are treated as secondary events, and are accessed from their corresponding  StartElement   event.
  • Similar to  Attribute   events,  Namespace   events are treated as secondary, but appear twice and are accessible twice in the event stream, first from their corresponding  StartElement   and then from their corresponding  EndElement .
  • Character   events are specified for all elements, even if those elements have no character data. Similarly,  Character   events can be split across events.
  • The StAX parser maintains a namespace stack, which holds information about all XML namespaces defined for the current element and its ancestors. The namespace stack is exposed through thejavax.xml.namespace.NamespaceContext   interface, and can be accessed by namespace prefix or URI.

Choosing Between Cursor and Iterator APIs

It is reasonable to ask at this point, "What API should I choose? Should I create instances of  XMLStreamReader   or  XMLEventReader ? Why are there two kinds of APIs anyway?"

Development Goals

The authors of the StAX specification targeted three types of developers:

  • Library and infrastructure developers  - Create application servers, JAXM, JAXB, JAX-RPC and similar implementations; need highly efficient, low-level APIs with minimal extensibility requirements.
  • J2ME developers   - Need small, simple, pull-parsing libraries, and have minimal extensibility needs.
  • J2EE and J2SE developers   - Need clean, efficient pull-parsing libraries, plus need the flexibility to both read and write XML streams, create new event types, and extend XML document elements and attributes.

Given these wide-ranging development categories, the StAX authors felt it was more useful to define two small, efficient APIs rather than overloading one larger and necessarily more complex API.

Comparing Cursor and Iterator APIs

Before choosing between the cursor and iterator APIs, you should note a few things that you can do with the iterator API that you cannot do with cursor API:

  • Objects created from the  XMLEvent   subclasses are immutable, and can be used in arrays, lists, and maps, and can be passed through your applications even after the parser has moved on to subsequent events.
  • You can create subtypes of  XMLEvent   that are either completely new information items or extensions of existing items but with additional methods.
  • You can add and remove events from an XML event stream in much simpler ways than with the cursor API.

Similarly, keep some general recommendations in mind when making your choice:

  • If you are programming for a particularly memory-constrained environment, like J2ME, you can make smaller, more efficient code with the cursor API.
  • If performance is your highest priority--for example, when creating low-level libraries or infrastructure--the cursor API is more efficient.
  • If you want to create XML processing pipelines, use the iterator API.
  • If you want to modify the event stream, use the iterator API.
  • If you want to your application to be able to handle pluggable processing of the event stream, use the iterator API.
  • color: black; font-family: Arial,Verdana,Helvetica,sans-serif; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; margin: 6pt 0em; text-alig
    分享到:
    评论

相关推荐

    stax-api 1.0.1

    STAX(Streaming API for XML)是Java平台上的一个XML处理API,它提供了对XML文档的事件驱动解析。与DOM(Document Object Model)不同,STAX不是加载整个XML文档到内存中形成一个树形结构,而是通过事件(如开始...

    stax-api-1.0-2

    **STAX API 1.0-2:XML处理的关键组件** STAX,全称为Streaming API for XML,是一种用于处理XML的Java API。STAX提供了一种流式处理XML文档的方法,与DOM(Document Object Model)相比,它更加高效且内存占用更低...

    stax-api.jar

    `stax-api.jar`是Java中用于处理XML流的STAX(Streaming API for XML)的API接口库,它提供了与XML数据交互的一套标准化接口。 STAX(Streaming API for XML)是一种低级别的XML解析方法,相比DOM(Document Object...

    stax-api-1.0.1、stax2-api-3.1.1、woodstox-core-asl-4.1.1

    STAX(Streaming API for XML)是一种用于处理XML的Java API,它提供了事件驱动的解析方式,使得开发者可以高效地读取和写入XML文档。在Java世界中,STAX提供了比DOM(Document Object Model)更高效的处理XML的方式...

    stax-api-1.0.1 java 操作 xml 文件 一个很好用的包

    STAX(Streaming API for XML)是Java平台上的一个XML处理API,它提供了高效且灵活的方式来读取和写入XML文档。STAX的核心理念是事件驱动,即解析XML时,每遇到一个XML元素或属性,都会触发一个相应的事件,程序通过...

    stax2-api-3.1.1.jar

    Streaming API for XML(StAX)是Java平台上的一个高级XML处理API,它提供了对XML文档的高效、低级别的访问方式。"stax2-api-3.1.1.jar"是一个Java库,主要用于XML解析,特别是采用了StAX 2.0规范的实现。这个jar...

    stax2-api-3.1.4-API文档-中英对照版.zip

    赠送jar包:stax2-api-3.1.4.jar; 赠送原API文档:stax2-api-3.1.4-javadoc.jar; 赠送源代码:stax2-api-3.1.4-sources.jar; 赠送Maven依赖信息文件:stax2-api-3.1.4.pom; 包含翻译后的API文档:stax2-api-...

    stax-api-1.0.jar

    Java EE, Android读写XML,POI使用

    jsr173_api

    JSR 173 API的核心接口是`javax.xml.transform.stax.StAXSource`和`javax.xml.transform.stax.StAXResult`,它们分别用于创建XML源和结果。这两个接口与Java的StAX(Streaming API for XML)结合,使得开发者能够...

    stax2-api-3.1.4.jar

    实现了特殊的XML验证,一般来说使用SAXParser来读入XML文件再进行验证,但是这里使 用了边写边验证的功能,如果是...是Stax2 API提供的功能 java转换json或xml,支持Java和Json格式的互转,同时也支持Java和XML的互转

    stax2-api-4.2.1.jar

    用来解析XML文件的jar包。Streaming API for XML (StAX) 的基于事件迭代器 API 无论在性能还是在可用性上都有其他 XML 处理方法所不及的独到之处。使用前请先解压

    backport-util-concurrent-3.1.jar和geronimo-stax-api_1.0_spec-1.0.1.jar

    标题中的"backport-util-concurrent-3.1.jar"和"geronimo-stax-api_1.0_spec-1.0.1.jar"是两个Java库文件,它们在解决Eclipse Axis2 Codegen插件报错问题时起着关键作用。Axis2是一个流行的Web服务框架,而Codegen...

    stax-ex-1.8.3.zip

    - **XMLStreamReader/Writer的扩展**:除了标准STAX API提供的功能,STAX EX还提供了一些扩展API,如XMLStreamReader的peek()方法,允许开发者预览下一个事件,以及XMLStreamWriter的writeCharacters()方法的变体,...

    stax2-api-3.1.1-sources.jar

    stax2-api-3.1.1-sources.jar文件,下载使用,用来解析XML文件的jar包。Streaming API for XML (StAX) 的基于事件迭代器 API 无论在性能还是在可用性上都有其他 XML 处理方法所不及的独到之处。

    poi-3.9的jar包,包含dom4j/stax-api-1.0.1/xmlbeans-2.3.0

    3. stax-api-1.0.1:STAX (Streaming API for XML) 是一种用于解析和生成XML的Java API,它允许程序以事件驱动的方式处理XML流。在Apache POI中,STAX API被用来高效地读取和写入大型XML文件,比如XLSX和DOCX格式的...

    stax2-api:Stax扩展API,Java拉解析API(针对Xml的Streaming Api)

    Stax2 API是标准 API(“用于Xml处理的STandard Api”)的扩展,它是JDK 6中添加的JDK的pull-parser API。 地位 支持 Stax2 API通过以下Stax XML实现来本地实现: (面向性能,也是非阻塞/异步的) (Java平台上...

    geronimo-stax-api_1.0_spec-1.0.jar

    geronimo-stax-api_1.0_spec-1.0.jar

    jsr173_1.0_api.jar

    【标题】"jsr173_1.0_api.jar"是Java标准版的一个扩展,全称为Java Specification Request 173(JSR 173),它定义了Java编程语言对XML Streaming API(StAX - Streaming API for XML)的支持。这个API允许开发者以...

    stax2-api-3.1.4-API文档-中文版.zip

    赠送jar包:stax2-api-3.1.4.jar; 赠送原API文档:stax2-api-3.1.4-javadoc.jar; 赠送源代码:stax2-api-3.1.4-sources.jar; 赠送Maven依赖信息文件:stax2-api-3.1.4.pom; 包含翻译后的API文档:stax2-api-...

Global site tag (gtag.js) - Google Analytics