- 浏览: 136705 次
- 性别:
- 来自: 上海
-
文章分类
最新评论
-
qq466862016:
不错的文章
JDK动态代理与CGLIB代理的对比 -
jinxiongyi:
你好,jpedal pdf转换图片的 画质,怎么提高。。我转 ...
介绍几款PDF转图片的开源工具 -
qqdwll:
转图片消耗的内存还是不小。 有时间得找找有没有更好的办法, 把 ...
介绍几款PDF转图片的开源工具 -
xiaoyao3857:
Thanks for your work!It's help ...
Keeping Eclipse running clean (转载) -
iceside:
图片讲解非常详细,说清了引用复制是怎么回事
Java 值传递的终极解释
[转载] http://www.ibm.com/developerworks/xml/library/x-javaxmlvalidapi.html
- 博客分类:
- Java
原文 http://www.ibm.com/developerworks/xml/library/x-javaxmlvalidapi.html
Validation is a powerful tool. It enables you to quickly check that input is roughly in the form you expect and quickly reject any document that is too far away from what your process can handle. If there's a problem with the data, it's better to find out earlier than later.
In the context of Extensible Markup Language (XML), validation normally involves writing a detailed specification for the document's contents in any of several schema languages such as the World Wide Web Consortium (W3C) XML Schema Language (XSD), RELAX NG, Document Type Definitions (DTDs), and Schematron. Sometimes validation is performed while parsing, sometimes immediately after. However, it's usually done before any further processing of the input takes place. (This description is painted with broad strokes -- there are exceptions.)
Until recently, the exact Application Programming Interface (API) by which programs requested validation varied with the schema language and parser. DTDs and XSD were normally accessed as configuration options in Simple API for XML (SAX), Document Object Model (DOM), and Java™ API for XML Processing (JAXP). RELAX NG required a custom library and API. Schematron might use the Transformations API for XML(TrAX); and still other schema languages required programmers to learn still more APIs, even though they were performing essentially the same operation.
Java 5 introduced the javax.xml.validation package to provide a schema-language-independent interface to validation services. This package is also available in Java 1.3 and later when you install JAXP 1.3 separately. Among other products, an implementation of this library is included with Xerces 2.8.
Validation
The javax.xml.validation API uses three classes to validate documents: SchemaFactory, Schema, and Validator. It also makes extensive use of the javax.xml.transform.Source interface from TrAX to represent the XML documents. In brief, a SchemaFactory reads the schema document (often an XML file) from which it creates a Schema object. The Schema object creates a Validator object. Finally, the Validator object validates an XML document represented as a Source.
Listing 1 shows a simple program to validate a URL entered on the command line against the DocBook XSD schema.
Listing 1. Validating an Extensible Hypertext Markup Language (XHTML) document
import java.io.*;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import org.xml.sax.SAXException;
public class DocbookXSDCheck {
public static void main(String[] args) throws SAXException, IOException {
// 1. Lookup a factory for the W3C XML Schema language
SchemaFactory factory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
// 2. Compile the schema.
// Here the schema is loaded from a java.io.File, but you could use
// a java.net.URL or a javax.xml.transform.Source instead.
File schemaLocation = new File("/opt/xml/docbook/xsd/docbook.xsd");
Schema schema = factory.newSchema(schemaLocation);
// 3. Get a validator from the schema.
Validator validator = schema.newValidator();
// 4. Parse the document you want to check.
Source source = new StreamSource(args[0]);
// 5. Check the document
try {
validator.validate(source);
System.out.println(args[0] + " is valid.");
}
catch (SAXException ex) {
System.out.println(args[0] + " is not valid because ");
System.out.println(ex.getMessage());
}
}
}
Here's some typical output when checking an invalid document using the version of Xerces bundled with Java 2 Software Development Kit (JDK) 5.0:
file:///Users/elharo/CS905/Course_Notes.xml is not valid because cvc-complex-type.2.3: Element 'legalnotice' cannot have character [children], because the type's content type is element-only.
You can easily change the schema to validate against, the document to validate, and even the schema language. However, in all cases, validation follows these five steps:
Load a schema factory for the language the schema is written in.
Compile the schema from its source.
Create a validator from the compiled schema.
Create a Source object for the document you want to validate. A StreamSource is usually simplest.
Validate the input source. If the document is invalid, the validate() method throws a SAXException. Otherwise, it returns quietly.
You can reuse the same validator and the same schema multiple times in series. However, only the schema is thread safe. Validators and schema factories are not. If you validate in multiple threads simultaneously, make sure each one has its own Validator and SchemaFactory objects.
Validate against a document-specified schema
Some documents specify the schema they expect to be validated against, typically using xsi:noNamespaceSchemaLocation and/or xsi:schemaLocation attributes like this:
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.example.com/document.xsd">
...
If you create a schema without specifying a URL, file, or source, then the Java language creates one that looks in the document being validated to find the schema it should use. For example:
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema();
However, normally this isn't what you want. Usually the document consumer should choose the schema, not the document producer. Furthermore, this approach works only for XSD. All other schema languages require an explicitly specified schema location.
--------------------------------------------------------------------------------
Abstract factories
SchemaFactory is an abstract factory. The abstract factory design pattern enables this one API to support many different schema languages and object models. A single implementation usually supports only a subset of the numerous languages and models. However, once you learn the API for validating DOM documents against RELAX NG schemas (for instance), you can use the same API to validate JDOM documents against W3C schemas.
For example, Listing 2 shows a program that validates DocBook documents against DocBook's RELAX NG schema. It's almost identical to Listing 1. The only things that have changed are the location of the schema and the URL that identifies the schema language.
Listing 2. Validating a DocBook document using RELAX NG
import java.io.*;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import org.xml.sax.SAXException;
public class DocbookRELAXNGCheck {
public static void main(String[] args) throws SAXException, IOException {
// 1. Specify you want a factory for RELAX NG
SchemaFactory factory
= SchemaFactory.newInstance("http://relaxng.org/ns/structure/1.0");
// 2. Load the specific schema you want.
// Here I load it from a java.io.File, but we could also use a
// java.net.URL or a javax.xml.transform.Source
File schemaLocation = new File("/opt/xml/docbook/rng/docbook.rng");
// 3. Compile the schema.
Schema schema = factory.newSchema(schemaLocation);
// 4. Get a validator from the schema.
Validator validator = schema.newValidator();
// 5. Parse the document you want to check.
String input
= "file:///Users/elharo/Projects/workspace/CS905/build/Java_Course_Notes.xml";
// 6. Check the document
try {
validator.validate(source);
System.out.println(input + " is valid.");
}
catch (SAXException ex) {
System.out.println(input + " is not valid because ");
System.out.println(ex.getMessage());
}
}
}
If you run this program with the stock Sun JDK and no extra libraries, you'll probably see something like this:
Exception in thread "main" java.lang.IllegalArgumentException:
http://relaxng.org/ns/structure/1.0
at javax.xml.validation.SchemaFactory.newInstance(SchemaFactory.java:186)
at DocbookRELAXNGCheck.main(DocbookRELAXNGCheck.java:14)
This is because, out of the box, the JDK doesn't include a RELAX NG validator. When the schema language isn't recognized, SchemaFactory.newInstance() throws an IllegalArgumentException. However, if you install a RELAX NG library such as Jing and a JAXP 1.3 adapter, then it should produce the same answer the W3C schema does.
Identify the schema language
The javax.xml.constants class defines several constants to identify schema languages:
XMLConstants.W3C_XML_SCHEMA_NS_URI: http://www.w3.org/2001/XMLSchema
XMLConstants.RELAXNG_NS_URI: http://relaxng.org/ns/structure/1.0
XMLConstants.XML_DTD_NS_URI: http://www.w3.org/TR/REC-xml
This isn't a closed list. Implementations are free to add other URLs to this list to identify other schema languages. Typically, the URL is the namespace Uniform Resource Identifier (URI) for the schema language. For example, the URL http://www.ascc.net/xml/schematron identifies Schematron schemas.
Sun's JDK 5 only supports XSD schemas. Although DTD validation is supported, it isn't accessible through the javax.xml.validation API. For DTDs, you have to use the regular SAX XMLReader class. However, you can install additional libraries that add support for these and other schema languages.
How schema factories are located
The Java programming language isn't limited to a single schema factory. When you pass a URI identifying a particular schema language to SchemaFactory.newInstance(), it searches the following locations in this order to find a matching factory:
The class named by the "javax.xml.validation.SchemaFactory:schemaURL" system property
The class named by the "javax.xml.validation.SchemaFactory:schemaURL" property found in the $java.home/lib/jaxp.properties file
javax.xml.validation.SchemaFactory service providers found in the META-INF/services directories of any available Java Archive (JAR) files
A platform default SchemaFactory, com.sun.org.apache.xerces.internal.jaxp.validation.xs.SchemaFactoryImpl in JDK 5
To add support for your own custom schema language and corresponding validator, all you have to do is write subclasses of SchemaFactory, Schema, and Validator that know how to process your schema language. Then, install your JAR in one of these four locations. This is useful for adding constraints that are more easily checked in a Turing-complete language like Java than in a declarative language like the W3C XML Schema language. You can define a mini-schema language, write a quick implementation, and plug it into the validation layer.
--------------------------------------------------------------------------------
Error handlers
The default response from a schema is to throw a SAXException if there's a problem and do nothing if there isn't. However, you can provide a SAX ErrorHandler to receive more detailed information about the document's problems. For example, suppose you want to log all validation errors, but you don't want to stop processing when you encounter one. You can install an error handler such as that in Listing 3.
Listing 3. An error handler that merely logs non-fatal validity errors
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class ForgivingErrorHandler implements ErrorHandler {
public void warning(SAXParseException ex) {
System.err.println(ex.getMessage());
}
public void error(SAXParseException ex) {
System.err.println(ex.getMessage());
}
public void fatalError(SAXParseException ex) throws SAXException {
throw ex;
}
}
To install this error handler, you create an instance of it and pass that instance to the Validator's setErrorHandler() method:
ErrorHandler lenient = new ForgivingErrorHandler();
validator.setErrorHandler(lenient);
--------------------------------------------------------------------------------
Schema augmentation
Some schemas do more than validate. As well as providing a true-false answer to the question of whether a document is valid, they also augment the document with additional information. For example, they can provide default attribute values. They might also assign types like int or gYear to an element or attribute. The validator can create such type-augmented documents and write them onto a javax.xml.transform.Result object. All you need to do is pass a Result as the second argument to validate. For example, Listing 4 both validates an input document and creates an augmented DOM document from the combination of the input with the schema.
Listing 4. Augmenting a document with a schema
import java.io.*;
import javax.xml.transform.dom.*;
import javax.xml.validation.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
public class DocbookXSDAugmenter {
public static void main(String[] args)
throws SAXException, IOException, ParserConfigurationException {
SchemaFactory factory
= SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
File schemaLocation = new File("/opt/xml/docbook/xsd/docbook.xsd");
Schema schema = factory.newSchema(schemaLocation);
Validator validator = schema.newValidator();
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true); // never forget this
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(new File(args[0]));
DOMSource source = new DOMSource(doc);
DOMResult result = new DOMResult();
try {
validator.validate(source, result);
Document augmented = (Document) result.getNode();
// do whatever you need to do with the augmented document...
}
catch (SAXException ex) {
System.out.println(args[0] + " is not valid because ");
System.out.println(ex.getMessage());
}
}
}
This procedure can't transform an arbitrary source into an arbitrary result. It doesn't work at all for stream sources and results. SAX sources can be augmented into SAX results, and DOM sources into DOM results; but SAX sources can't be augmented to DOM results or vice versa. If you need to do that, first augment into the matching result -- SAX for SAX and DOM for DOM -- and then use TrAX's identity transform to change the model.
This technique isn't recommended, though. Putting all the information the document requires in the instance is far more reliable than splitting it between the instance and the schema. You might validate, but not everyone will.
--------------------------------------------------------------------------------
Type information
The W3C XML Schema Language is heavily based on the notion of types. Elements and attributes are declared to be of type int, double, date, duration, person, PhoneNumber, or anything else you can imagine. The Java Validation API includes a means to report such types, although it's surprisingly independent of the rest of the package.
Types are identified by an org.w3c.dom.TypeInfo object. This simple interface, summarized in Listing 5, tells you the local name and namespace URI of a type. You can also tell whether and how a type is derived from another type. Beyond that, understanding the type is up to your program. The Java language doesn't tell you what it means or convert the data to a Java type such as double or java.util.Date.
Listing 5. The DOM TypeInfo interface
package org.w3c.dom;
public interface TypeInfo {
public static final int DERIVATION_RESTRICTION;
public static final int DERIVATION_EXTENSION;
public static final int DERIVATION_UNION;
public String getTypeName();
public String getTypeNamespace()
public boolean isDerivedFrom(String namespace, String name, int derivationMethod);
}
To get TypeInfo objects, you ask the Schema object for a ValidatorHandler rather than a Validator. ValidatorHandler implements SAX's ContentHandler interface. Then, you install this handler in a SAX parser.
You also install your own ContentHandler in the ValidatorHandler (not the parser); the ValidatorHandler will forward the augmented events on to your ContentHandler.
The ValidatorHandler makes available a TypeInfoProvider that your ContentHandler can call at any time to find out the type of the current element or one of its attributes. It can also tell you whether an attribute is an ID, and whether the attribute was explicitly specified in the document or defaulted in from the schema. Listing 6 summarizes this class.
Listing 6. The TypeInfoProvider class
package javax.xml.validation;
public abstract class TypeInfoProvider {
public abstract TypeInfo getElementTypeInfo();
public abstract TypeInfo getAttributeTypeInfo(int index);
public abstract boolean isIdAttribute(int index);
public abstract boolean isSpecified(int index);
}
Finally, you parse the document with the SAX XMLReader. Listing 7 shows a simple program that uses all these classes and interfaces to print out the names of all the types of the elements in a document.
Listing 7. Listing element types
import java.io.*;
import javax.xml.validation.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public class TypeLister extends DefaultHandler {
private TypeInfoProvider provider;
public TypeLister(TypeInfoProvider provider) {
this.provider = provider;
}
public static void main(String[] args) throws SAXException, IOException {
SchemaFactory factory
= SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
File schemaLocation = new File("/opt/xml/docbook/xsd/docbook.xsd");
Schema schema = factory.newSchema(schemaLocation);
ValidatorHandler vHandler = schema.newValidatorHandler();
TypeInfoProvider provider = vHandler.getTypeInfoProvider();
ContentHandler cHandler = new TypeLister(provider);
vHandler.setContentHandler(cHandler);
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(vHandler);
parser.parse(args[0]);
}
public void startElement(String namespace, String localName,
String qualifiedName, Attributes atts) throws SAXException {
String type = provider.getElementTypeInfo().getTypeName();
System.out.println(qualifiedName + ": " + type);
}
}
Here's the start of the output from running this code on a typical DocBook document:
book: #AnonType_book
title: #AnonType_title
subtitle: #AnonType_subtitle
info: #AnonType_info
copyright: #AnonType_copyright
year: #AnonType_year
holder: #AnonType_holder
author: #AnonType_author
personname: #AnonType_personname
firstname: #AnonType_firstname
othername: #AnonType_othername
surname: #AnonType_surname
personblurb: #AnonType_personblurb
para: #AnonType_para
link: #AnonType_link
As you can see, the DocBook schema assigns most elements anonymous complex types. Obviously, this will vary from one schema to the next.
--------------------------------------------------------------------------------
Conclusion
The world would be a poorer place if everyone spoke just one language. Programmers would be unhappy if they had only one programming language to choose from. Different languages suit different tasks better, and some tasks require more than one language. XML schemas are no different. You can choose from a plethora of useful schema languages. In Java 5 with javax.xml.validation, you have an API that can handle all of them.
Validation is a powerful tool. It enables you to quickly check that input is roughly in the form you expect and quickly reject any document that is too far away from what your process can handle. If there's a problem with the data, it's better to find out earlier than later.
In the context of Extensible Markup Language (XML), validation normally involves writing a detailed specification for the document's contents in any of several schema languages such as the World Wide Web Consortium (W3C) XML Schema Language (XSD), RELAX NG, Document Type Definitions (DTDs), and Schematron. Sometimes validation is performed while parsing, sometimes immediately after. However, it's usually done before any further processing of the input takes place. (This description is painted with broad strokes -- there are exceptions.)
Until recently, the exact Application Programming Interface (API) by which programs requested validation varied with the schema language and parser. DTDs and XSD were normally accessed as configuration options in Simple API for XML (SAX), Document Object Model (DOM), and Java™ API for XML Processing (JAXP). RELAX NG required a custom library and API. Schematron might use the Transformations API for XML(TrAX); and still other schema languages required programmers to learn still more APIs, even though they were performing essentially the same operation.
Java 5 introduced the javax.xml.validation package to provide a schema-language-independent interface to validation services. This package is also available in Java 1.3 and later when you install JAXP 1.3 separately. Among other products, an implementation of this library is included with Xerces 2.8.
Validation
The javax.xml.validation API uses three classes to validate documents: SchemaFactory, Schema, and Validator. It also makes extensive use of the javax.xml.transform.Source interface from TrAX to represent the XML documents. In brief, a SchemaFactory reads the schema document (often an XML file) from which it creates a Schema object. The Schema object creates a Validator object. Finally, the Validator object validates an XML document represented as a Source.
Listing 1 shows a simple program to validate a URL entered on the command line against the DocBook XSD schema.
Listing 1. Validating an Extensible Hypertext Markup Language (XHTML) document
import java.io.*;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import org.xml.sax.SAXException;
public class DocbookXSDCheck {
public static void main(String[] args) throws SAXException, IOException {
// 1. Lookup a factory for the W3C XML Schema language
SchemaFactory factory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
// 2. Compile the schema.
// Here the schema is loaded from a java.io.File, but you could use
// a java.net.URL or a javax.xml.transform.Source instead.
File schemaLocation = new File("/opt/xml/docbook/xsd/docbook.xsd");
Schema schema = factory.newSchema(schemaLocation);
// 3. Get a validator from the schema.
Validator validator = schema.newValidator();
// 4. Parse the document you want to check.
Source source = new StreamSource(args[0]);
// 5. Check the document
try {
validator.validate(source);
System.out.println(args[0] + " is valid.");
}
catch (SAXException ex) {
System.out.println(args[0] + " is not valid because ");
System.out.println(ex.getMessage());
}
}
}
Here's some typical output when checking an invalid document using the version of Xerces bundled with Java 2 Software Development Kit (JDK) 5.0:
file:///Users/elharo/CS905/Course_Notes.xml is not valid because cvc-complex-type.2.3: Element 'legalnotice' cannot have character [children], because the type's content type is element-only.
You can easily change the schema to validate against, the document to validate, and even the schema language. However, in all cases, validation follows these five steps:
Load a schema factory for the language the schema is written in.
Compile the schema from its source.
Create a validator from the compiled schema.
Create a Source object for the document you want to validate. A StreamSource is usually simplest.
Validate the input source. If the document is invalid, the validate() method throws a SAXException. Otherwise, it returns quietly.
You can reuse the same validator and the same schema multiple times in series. However, only the schema is thread safe. Validators and schema factories are not. If you validate in multiple threads simultaneously, make sure each one has its own Validator and SchemaFactory objects.
Validate against a document-specified schema
Some documents specify the schema they expect to be validated against, typically using xsi:noNamespaceSchemaLocation and/or xsi:schemaLocation attributes like this:
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.example.com/document.xsd">
...
If you create a schema without specifying a URL, file, or source, then the Java language creates one that looks in the document being validated to find the schema it should use. For example:
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema();
However, normally this isn't what you want. Usually the document consumer should choose the schema, not the document producer. Furthermore, this approach works only for XSD. All other schema languages require an explicitly specified schema location.
--------------------------------------------------------------------------------
Abstract factories
SchemaFactory is an abstract factory. The abstract factory design pattern enables this one API to support many different schema languages and object models. A single implementation usually supports only a subset of the numerous languages and models. However, once you learn the API for validating DOM documents against RELAX NG schemas (for instance), you can use the same API to validate JDOM documents against W3C schemas.
For example, Listing 2 shows a program that validates DocBook documents against DocBook's RELAX NG schema. It's almost identical to Listing 1. The only things that have changed are the location of the schema and the URL that identifies the schema language.
Listing 2. Validating a DocBook document using RELAX NG
import java.io.*;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import org.xml.sax.SAXException;
public class DocbookRELAXNGCheck {
public static void main(String[] args) throws SAXException, IOException {
// 1. Specify you want a factory for RELAX NG
SchemaFactory factory
= SchemaFactory.newInstance("http://relaxng.org/ns/structure/1.0");
// 2. Load the specific schema you want.
// Here I load it from a java.io.File, but we could also use a
// java.net.URL or a javax.xml.transform.Source
File schemaLocation = new File("/opt/xml/docbook/rng/docbook.rng");
// 3. Compile the schema.
Schema schema = factory.newSchema(schemaLocation);
// 4. Get a validator from the schema.
Validator validator = schema.newValidator();
// 5. Parse the document you want to check.
String input
= "file:///Users/elharo/Projects/workspace/CS905/build/Java_Course_Notes.xml";
// 6. Check the document
try {
validator.validate(source);
System.out.println(input + " is valid.");
}
catch (SAXException ex) {
System.out.println(input + " is not valid because ");
System.out.println(ex.getMessage());
}
}
}
If you run this program with the stock Sun JDK and no extra libraries, you'll probably see something like this:
Exception in thread "main" java.lang.IllegalArgumentException:
http://relaxng.org/ns/structure/1.0
at javax.xml.validation.SchemaFactory.newInstance(SchemaFactory.java:186)
at DocbookRELAXNGCheck.main(DocbookRELAXNGCheck.java:14)
This is because, out of the box, the JDK doesn't include a RELAX NG validator. When the schema language isn't recognized, SchemaFactory.newInstance() throws an IllegalArgumentException. However, if you install a RELAX NG library such as Jing and a JAXP 1.3 adapter, then it should produce the same answer the W3C schema does.
Identify the schema language
The javax.xml.constants class defines several constants to identify schema languages:
XMLConstants.W3C_XML_SCHEMA_NS_URI: http://www.w3.org/2001/XMLSchema
XMLConstants.RELAXNG_NS_URI: http://relaxng.org/ns/structure/1.0
XMLConstants.XML_DTD_NS_URI: http://www.w3.org/TR/REC-xml
This isn't a closed list. Implementations are free to add other URLs to this list to identify other schema languages. Typically, the URL is the namespace Uniform Resource Identifier (URI) for the schema language. For example, the URL http://www.ascc.net/xml/schematron identifies Schematron schemas.
Sun's JDK 5 only supports XSD schemas. Although DTD validation is supported, it isn't accessible through the javax.xml.validation API. For DTDs, you have to use the regular SAX XMLReader class. However, you can install additional libraries that add support for these and other schema languages.
How schema factories are located
The Java programming language isn't limited to a single schema factory. When you pass a URI identifying a particular schema language to SchemaFactory.newInstance(), it searches the following locations in this order to find a matching factory:
The class named by the "javax.xml.validation.SchemaFactory:schemaURL" system property
The class named by the "javax.xml.validation.SchemaFactory:schemaURL" property found in the $java.home/lib/jaxp.properties file
javax.xml.validation.SchemaFactory service providers found in the META-INF/services directories of any available Java Archive (JAR) files
A platform default SchemaFactory, com.sun.org.apache.xerces.internal.jaxp.validation.xs.SchemaFactoryImpl in JDK 5
To add support for your own custom schema language and corresponding validator, all you have to do is write subclasses of SchemaFactory, Schema, and Validator that know how to process your schema language. Then, install your JAR in one of these four locations. This is useful for adding constraints that are more easily checked in a Turing-complete language like Java than in a declarative language like the W3C XML Schema language. You can define a mini-schema language, write a quick implementation, and plug it into the validation layer.
--------------------------------------------------------------------------------
Error handlers
The default response from a schema is to throw a SAXException if there's a problem and do nothing if there isn't. However, you can provide a SAX ErrorHandler to receive more detailed information about the document's problems. For example, suppose you want to log all validation errors, but you don't want to stop processing when you encounter one. You can install an error handler such as that in Listing 3.
Listing 3. An error handler that merely logs non-fatal validity errors
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class ForgivingErrorHandler implements ErrorHandler {
public void warning(SAXParseException ex) {
System.err.println(ex.getMessage());
}
public void error(SAXParseException ex) {
System.err.println(ex.getMessage());
}
public void fatalError(SAXParseException ex) throws SAXException {
throw ex;
}
}
To install this error handler, you create an instance of it and pass that instance to the Validator's setErrorHandler() method:
ErrorHandler lenient = new ForgivingErrorHandler();
validator.setErrorHandler(lenient);
--------------------------------------------------------------------------------
Schema augmentation
Some schemas do more than validate. As well as providing a true-false answer to the question of whether a document is valid, they also augment the document with additional information. For example, they can provide default attribute values. They might also assign types like int or gYear to an element or attribute. The validator can create such type-augmented documents and write them onto a javax.xml.transform.Result object. All you need to do is pass a Result as the second argument to validate. For example, Listing 4 both validates an input document and creates an augmented DOM document from the combination of the input with the schema.
Listing 4. Augmenting a document with a schema
import java.io.*;
import javax.xml.transform.dom.*;
import javax.xml.validation.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
public class DocbookXSDAugmenter {
public static void main(String[] args)
throws SAXException, IOException, ParserConfigurationException {
SchemaFactory factory
= SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
File schemaLocation = new File("/opt/xml/docbook/xsd/docbook.xsd");
Schema schema = factory.newSchema(schemaLocation);
Validator validator = schema.newValidator();
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true); // never forget this
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(new File(args[0]));
DOMSource source = new DOMSource(doc);
DOMResult result = new DOMResult();
try {
validator.validate(source, result);
Document augmented = (Document) result.getNode();
// do whatever you need to do with the augmented document...
}
catch (SAXException ex) {
System.out.println(args[0] + " is not valid because ");
System.out.println(ex.getMessage());
}
}
}
This procedure can't transform an arbitrary source into an arbitrary result. It doesn't work at all for stream sources and results. SAX sources can be augmented into SAX results, and DOM sources into DOM results; but SAX sources can't be augmented to DOM results or vice versa. If you need to do that, first augment into the matching result -- SAX for SAX and DOM for DOM -- and then use TrAX's identity transform to change the model.
This technique isn't recommended, though. Putting all the information the document requires in the instance is far more reliable than splitting it between the instance and the schema. You might validate, but not everyone will.
--------------------------------------------------------------------------------
Type information
The W3C XML Schema Language is heavily based on the notion of types. Elements and attributes are declared to be of type int, double, date, duration, person, PhoneNumber, or anything else you can imagine. The Java Validation API includes a means to report such types, although it's surprisingly independent of the rest of the package.
Types are identified by an org.w3c.dom.TypeInfo object. This simple interface, summarized in Listing 5, tells you the local name and namespace URI of a type. You can also tell whether and how a type is derived from another type. Beyond that, understanding the type is up to your program. The Java language doesn't tell you what it means or convert the data to a Java type such as double or java.util.Date.
Listing 5. The DOM TypeInfo interface
package org.w3c.dom;
public interface TypeInfo {
public static final int DERIVATION_RESTRICTION;
public static final int DERIVATION_EXTENSION;
public static final int DERIVATION_UNION;
public String getTypeName();
public String getTypeNamespace()
public boolean isDerivedFrom(String namespace, String name, int derivationMethod);
}
To get TypeInfo objects, you ask the Schema object for a ValidatorHandler rather than a Validator. ValidatorHandler implements SAX's ContentHandler interface. Then, you install this handler in a SAX parser.
You also install your own ContentHandler in the ValidatorHandler (not the parser); the ValidatorHandler will forward the augmented events on to your ContentHandler.
The ValidatorHandler makes available a TypeInfoProvider that your ContentHandler can call at any time to find out the type of the current element or one of its attributes. It can also tell you whether an attribute is an ID, and whether the attribute was explicitly specified in the document or defaulted in from the schema. Listing 6 summarizes this class.
Listing 6. The TypeInfoProvider class
package javax.xml.validation;
public abstract class TypeInfoProvider {
public abstract TypeInfo getElementTypeInfo();
public abstract TypeInfo getAttributeTypeInfo(int index);
public abstract boolean isIdAttribute(int index);
public abstract boolean isSpecified(int index);
}
Finally, you parse the document with the SAX XMLReader. Listing 7 shows a simple program that uses all these classes and interfaces to print out the names of all the types of the elements in a document.
Listing 7. Listing element types
import java.io.*;
import javax.xml.validation.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public class TypeLister extends DefaultHandler {
private TypeInfoProvider provider;
public TypeLister(TypeInfoProvider provider) {
this.provider = provider;
}
public static void main(String[] args) throws SAXException, IOException {
SchemaFactory factory
= SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
File schemaLocation = new File("/opt/xml/docbook/xsd/docbook.xsd");
Schema schema = factory.newSchema(schemaLocation);
ValidatorHandler vHandler = schema.newValidatorHandler();
TypeInfoProvider provider = vHandler.getTypeInfoProvider();
ContentHandler cHandler = new TypeLister(provider);
vHandler.setContentHandler(cHandler);
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(vHandler);
parser.parse(args[0]);
}
public void startElement(String namespace, String localName,
String qualifiedName, Attributes atts) throws SAXException {
String type = provider.getElementTypeInfo().getTypeName();
System.out.println(qualifiedName + ": " + type);
}
}
Here's the start of the output from running this code on a typical DocBook document:
book: #AnonType_book
title: #AnonType_title
subtitle: #AnonType_subtitle
info: #AnonType_info
copyright: #AnonType_copyright
year: #AnonType_year
holder: #AnonType_holder
author: #AnonType_author
personname: #AnonType_personname
firstname: #AnonType_firstname
othername: #AnonType_othername
surname: #AnonType_surname
personblurb: #AnonType_personblurb
para: #AnonType_para
link: #AnonType_link
As you can see, the DocBook schema assigns most elements anonymous complex types. Obviously, this will vary from one schema to the next.
--------------------------------------------------------------------------------
Conclusion
The world would be a poorer place if everyone spoke just one language. Programmers would be unhappy if they had only one programming language to choose from. Different languages suit different tasks better, and some tasks require more than one language. XML schemas are no different. You can choose from a plethora of useful schema languages. In Java 5 with javax.xml.validation, you have an API that can handle all of them.
发表评论
-
介绍几款PDF转图片的开源工具
2011-09-09 00:40 4607最近项目中有个需求需要把PDF转成一张图。经过调查,有 ... -
jadclipse(反编译Eclipse插件)
2011-07-19 19:13 1665Jad Java decompiler plugin for ... -
Java开发时候的内存溢出
2011-07-13 17:33 1200这里以tomcat环境为例, ... -
class loader
2011-07-08 17:23 0Because Class.getResource() eve ... -
Jakarta-Common-BeanUtils使用笔记
2011-07-06 16:55 1403原文转发http://blog.csdn.net/fa ... -
基于MVC模式Struts框架研究
2011-04-13 20:02 1359不做web开发多年了, 可偶尔去面试的时候, 还是 ... -
Java反射与动态代理
2011-04-13 15:08 1021这篇文章是 成富 先生在InfoQ上Java 深度历险系列的一 ... -
Java枚举类型
2011-04-04 19:50 796Tiger中的一个重要新特性是枚举构造,它是一种新的Java枚 ... -
Java 值传递的终极解释
2011-03-21 22:49 1984对于Java的值传递, 你真的了解么? Ja ... -
六种异常处理的陋习
2011-03-20 03:21 839你觉得自己是一个Java专 ... -
数组初始化
2011-03-20 02:40 897数组初始化,你觉得简单吗? a.如果你觉得简单,那请看下面的 ... -
Java 实现 hashCode 方法
2011-03-11 17:07 1219原文 http://www.javapractices.com ... -
Java 中 immutable class 以及怎样实现immutable 类
2011-03-11 16:47 1374原文 http://www.javapractices.com ... -
Java 内部类介绍
2011-02-16 17:14 1012转载: http://zhidao.baidu.com/que ... -
Java 中的Clone 学习总结
2011-01-25 18:22 27901. 一个类需要实现clone. 一个最佳实践是它需要实现 C ... -
java 通过流, nio 移动文件或者文件夹
2011-01-04 17:54 1896我们用例子说明java怎样通过不同的方式移动文件或文件夹。 ... -
转 深入探讨SOAP、RPC和RMI
2010-12-17 00:34 1083这篇文章是从网上转下来的。 原文应该是写于2001年。 10 ... -
java 6 中的性能优化
2010-12-07 15:30 1451文章转载自: http://www ... -
创建强健,稳定的 JMS 系统
2010-12-07 15:21 991The most reliable way to produc ... -
Java Modifier Summary
2010-11-12 15:10 900<tbody> <tr> ...
相关推荐
- 下载地址:[http://www-106.ibm.com/developerworks/java/jdk/](http://www-106.ibm.com/developerworks/java/jdk/) - 使用 JDK 1.4.1 for Linux (IBMJava2-SDK-1.4.1-2.0.i386.rpm) 4. **DB2 ESE V8.1**: - ...
4、可爱的 Python: 使用 setuptools 孵化 Python egg:http://www.ibm.com/developerworks/cn/linux/l-cppeak3.html 5 5、Python中setuptools的简介:http://www.juziblog.com/?p=365001 6 6、ez_setup.py脚本:...
标题中的链接指向的是IBM DeveloperWorks中国站点上的一篇关于Linux多线程编程的文章。这篇文章可能深入探讨了在Linux环境下如何高效地进行多线程程序设计,涵盖了相关的关键技术和最佳实践。由于描述为空,我们主要...
http://www.ibm.com/developerworks/cn/xml/wa-ajaxintro1.html 掌握 Ajax,第 2 部分: 使用 JavaScript 和 Ajax 发出异步请求 http://www.ibm.com/developerworks/cn/xml/wa-ajaxintro2/ 掌握 Ajax,第 3 部分: ...
4. IBM developerworks 中国 : Java:http://www-128.ibm.com/developerworks/cn/java/index.html - 中文 这是 IBM 公司官方的中文技术社区,提供了丰富的技术文档、代码示例等资源。 5. Java swing component and ...
- [oss.software.ibm.com](http://oss.software.ibm.com/linux/) 和 [developerWorks](http://www-900.ibm.com/developerWorks/cn/linux/):IBM的开放源码项目和Linux技术资源,包括各种教程和技术文章。...
- **网址**: http://www-128.ibm.com/developerworks/cn/ - **描述**: 提供了大量的技术文档,覆盖低、中、高级别的内容,适合用作系统学习的参考资料。 **8. SourceForge** - **网址**: http://sourceforge.net/ -...
- 地址: [http://www.redbooks.ibm.com](http://www.redbooks.ibm.com/) 和 [http://www-128.ibm.com/developerworks](http://www-128.ibm.com/developerworks) - 内容包括但不限于最新版本的信息、最佳实践、技术...
- [http://www.ibm.com/developerworks/cn/java/j-javares.html](http://www.ibm.com/developerworks/cn/java/j-javares.html) - **知识点**:IBM官方推荐的Java学习资源,包括书籍、文档和技术文章。 11. **CSDN...
- **IBM DeveloperWorks**([http://www.ibm.com/developerworks/cn/opensource/newto/](http://www.ibm.com/developerworks/cn/opensource/newto/)):IBM官方提供的技术资料库,涵盖了多种编程语言和技术框架。...
[架构图](http://www.ibm.com/developerworks/cn/java/j-lo-spring-principle/image001.gif) 然后我们皆可以写我们的demo了 ### 我们的Bean类 对于bean的理解,希望大家是把他看成Object对象,他可以是任何对象,...
7. **IBM Developerworks (http://www.ibm.com/developerworks/java)**:IBM的Java技术中心,包含了大量的技术文档和教程,特别是针对企业级应用开发。 8. **JavaWorld (http://www.javaworld.com)**:作为最早的...
2. **IBM developerWorks** - [http://www-900.ibm.com/developerWorks/cn/](http://www-900.ibm.com/developerWorks/cn/) - IBM的developerWorks不仅是一个面向对象的分析设计网站,还覆盖了Web Services、Java和...
7. **IBM Developerworks Java** - <http://www.ibm.com/developerworks/java> - IBM提供的Java技术专区,内容覆盖了从基础到高级的多个层次。 8. **JavaWorld** - <http://www.javaworld.com> - 专注于Java技术的...
- **网址**:http://www-128.ibm.com/developerworks/cn/java/index.html - **简介**:IBM的developerworks是业界知名的技术社区,其中的Java专区提供了许多高级的技术文章和解决方案。 ### 18. Java Swing组件和...
- **[IBM Developerworks](http://www.ibm.com/developerworks/java)**: IBM官方的技术资源网站,提供了大量的Java技术文档和教程。 #### 1.2 技术论坛和社区 - **[CN-Java](http://www.cn-java.com)**: 国内知名的...
1. webharvest官方网站...http://www.ibm.com/developerworks/cn/xml/x-xqueryl/ 可以在XML相关书籍中找到实例。 文档及相关资料就免费提供给大家了,另外我将自己抓取新浪网的实例也上传了与大家共享,欢迎一块下载!
#### 1.7 IBM Developerworks Java (http://www.ibm.com/developerworks/java) - **概述**:IBM提供的Java资源站点。 - **特点**:专注于Java技术的高级教程和文章。 #### 1.8 JavaWorld (http://www.javaworld.com...
- IBM技术社区:[http://www.ibm.com/developerworks/cn/opensource/os-cn-cas/](http://www.ibm.com/developerworks/cn/opensource/os-cn-cas/) - ITeye技术社区:[http://www.iteye.com/topic/544899]...