`
danan2008
  • 浏览: 518 次
  • 性别: Icon_minigender_1
  • 来自: 西安
最近访客 更多访客>>
文章分类
社区版块
存档分类
最新评论

A Brief Introduction to XInclude

 
阅读更多
A Brief Introduction to XInclude


It’s often convenient to divide long XML documents into multiple files. The classic example is a book, customarily divided in chapters. Each chapter may be further subdivided into sections. Traditionally this has implemented via external entity references. For example,
<?xml version="1.0"?>
<!DOCTYPE book SYSTEM "book.dtd"[
  <!ENTITY chapter1 SYSTEM "malapropisms.xml">
  <!ENTITY chapter2 SYSTEM "mispronunciations.xml">
  <!ENTITY chapter3 SYSTEM "madeupwords.xml">
]>
<book>
  <title>The Wit and Wisdom of George W. Bush</title>
  &chapter1;
  &chapter2;
  &chapter3;
</book>
However, external entity references have a number of limitations. Among them:

The individual component files cannot be treated in isolation. They often aren’t themselves full, well-formed XML documents. They cannot have document type declarations.

The document must have a DTD, and the parser must read the DTD. Not all parsers do.


If any of the pieces are missing, then the entire document is malformed. There’s no option for error recovery.


Only entire files can be included. You can’t include just one paragraph from a document.


There’s no way to include unparsed text such as an example Java program or XML document in a technical book. Only well-formed XML can be included, and all such XML is parsed. (SGML actually had this ability, but it was one of the features XML removed in the process of simplification.)


XInclude is an emerging specification from the W3C that endeavors to create a mechanism for building large XML documents out of their component parts which does not have these limitations. XInclude can combine multiple documents and parts thereof independently of validation. Each piece can be a complete XML document, a part of an XML document, or a non-XML text document like a Java program or an e-mail message.



Syntax

XInclude defines a single include element in the http://www.w3.org/2001/XInclude namespace. This can be mapped to any prefix though xi is customary. (In the remainder of this article, I will simply assume the xi prefix has been bound to the correct namespace URI without further comment.) Each xi:include element has an href attribute that contains a URL pointing to the file to include. For example, using XIncludes instead of external entity references, the previous book example can be rewritten like this:
<?xml version="1.0"?>
<book xmlns:xi="http://www.w3.org/2001/XInclude">
  <title>The Wit and Wisdom of George W. Bush</title>
  <xi:include href="malapropisms.xml"/>
  <xi:include href="mispronunciations.xml"/>
  <xi:include href="madeupwords.xml"/>
</book>`
Of course you can also use absolute URLs where appropriate:
<?xml version="1.0"?>
<book xmlns:xi="http://www.w3.org/2001/XInclude">
  <title>The Wit and Wisdom of George W. Bush</title>
  <xi:include href="http://www.whitehouse.gov/malapropisms.xml"/>
  <xi:include href="http://www.whitehouse.gov/mispronunciations.xml"/>
  <xi:include href="http://www.whitehouse.gov/madeupwords.xml"/>
</book>

XInclude processing is recursive. That is, an included document can itself include another document. For example, a book might be divided into front matter, back matter, and several parts:
<?xml version="1.0"?>
<book xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include href="frontmatter.xml"/>
  <xi:include href="part1.xml"/>
  <xi:include href="part2.xml"/>
  <xi:include href="part3.xml"/>
  <xi:include href="backmatter.xml"/>
</book>
Each part might be further divided into a part intro and several chapters:
<?xml version="1.0"?>
<part xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include href="intro1.xml"/>
  <xi:include href="chapter_1.xml"/>
  <xi:include href="chapter_2.xml"/>
  <xi:include href="chapter_3.xml"/>
  <xi:include href="chapter_4.xml"/>
</part>
There’s no limit to how deep this can go. Only circular inclusion (Document A includes Document B which includes, directly or indirectly, Document A) is forbidden. When an XInclude processor reads an XML document it resolves all references and returns a document that contains no XInclude elements. XInclusion is not part of XML 1.0 or the XML Information Set (Infoset). Thus to actually understand such a document, you’ll normally need to pass it through an XInclude processor that replaces the xi:include elements with the documents they point to. This may be done automatically by a server side process or it might be done on the client side by an XInclude aware browser. It may be hooked into a custom SAX program using a SAX filter that resolves the XIncludes. It may even be an option for parser resolution. However it does not happen automatically. If you want it, install the necessary software ands then explicitly tell the software to resolve the XInclude elements. The Gnome Project’s libxml and my own XOM both support XInclude. For example, if you’re using the xmllint tool bundled with libxml, you specify the --xinclude flag resolve the include elements like this:
$ xmllint --xinclude book.xml
<?xml version="1.0"?>
<book xmlns:xi="http://www.w3.org/2001/XInclude">
<title>The Wit and Wisdom of George W. Bush</title>
<preface>
…

Of course, there are APIs you can call from your own code, as well as programs to run from the command line. For instance, this code fragment resolves all the include elements in a XOM Document object and returns a new document that contains all the included content:
Document resolveDocument = XIncluder.resolveInPlace(inputDocument);
Unparsed Text

Technical articles like this one often include example code: Java and C programs, XML and HTML documents, e-mail messages and text files, and so forth. Within these examples characters like < and & should be understood as raw text rather than parsed as markup. You can indicate that you want a particular included document to be treated as text by adding a parse="text" attribute to the xi:include element. For example, this fragment loads the source code for the Java program SpellChecker.java from the examples directory into a code element:
<code>
<xi:include parse="text" href="examples/SpellChecker.java" />
</code>
Processes that are downstream from the XInclusion will see the complete text of the file SpellChecker.java like they would any other text. For instance, such data would be passed to a SAX ContentHandler object’s characters() method. This is pretty much exactly the same way a parser would treat the content if it were typed in a CDATA section.

The XInclude processor will attempt to determine the character encoding of the text file from any available metadata, such as a charset parameter in the included document’s MIME type. If the document is an XML document, then the processor will next try to use the byte order mark, the encoding declaration and the other customary heuristics for determining the character encoding of an XML document. If neither of these is suitable, the character set can be specified explicitly by an encoding attribute using the same names used for the encoding declaration in an XML document. For example, this element includes a file that’s written in Latin-1:
<xi:include parse="text" encoding="ISO-8859-1"
            href="examples/SpellChecker.java" />

If none of these options are available, then the processor assumes the document is written in UTF-8.

Fallback

Servers crash. Network connections fail. The DNS system gets congested. For all these reasons and more, documents included from remote servers may be temporarily unavailable. The default action for an XInclude processor in such a case is simply to give up and report a fatal error. However, the xi:include element may contain an xi:fallback element that contains alternate content to be used if the requested resource cannot be found. For example, this xi:include element tries to load the file at http://www.whitehouse.gov/malapropisms.xml. However, if somebody deletes that file, then it provides some literal content instead:
<xi:include
  href="http://www.whitehouse.gov/malapropisms.xml">
  <xi:fallback>

    <para>
    Our enemies are innovative and resourceful, and so are we.
    They never stop thinking about new ways to harm our country
    and our people, and neither do we.
    </para>

  </xi:fallback>
</xi:include>
The xi:fallback element can even include another xi:include element. For example, this xi:include element begins by attempting to include the document at http://www.whitehouse.gov/malapropisms.xml. However, if somebody deletes that file, then it will try http://politics.slate.msn.com/default.aspx?id=76886 instead.
<xi:include
  href="http://www.whitehouse.gov/malapropisms.xml">
  <xi:fallback>
    <xi:include href =
   "http://politics.slate.msn.com/default.aspx?id=76886 l" />
  </xi:fallback>
</xi:include>
The xi:fallback element is not used if the document can be located but is malformed. That is always a fatal error.

Include elements can contain other content besides the single xi:fallback element. For example, this xi:include element contains a xi:fallback and a para element:
<xi:include
  href="http://www.whitehouse.gov/malapropisms.xml">

  <para>
    Well, I think if you say you're going to do something
    and don't do it, that's trustworthiness.
  </para>

  <xi:fallback>
  <xi:include href="http://politics.slate.msn.com/default.aspx?id=76886l"/>
  </xi:fallback>
</xi:include>
However, the processor will ignore all such content. When the xi:include element is replaced, the para element will silently vanish.

XPointer

The URLs used in XInclude href attributes can have XPointer fragment identifiers. If so they only include those parts of the external document selected by the XPointer. For example, this XPointer includes only the malapropism elements from the document bushisms.xml:
<xi:include href="bushisms.xml#xpointer(//malapropism)"  /> 
Since XPointers can point up, down, and sideways in an XML document, and do not necessarily select a contiguous region of a document, they present significant problems for streaming applications and APIs like SAX, XNI, and StAX. Full XInclude with XPointer support really requires a tree-based API such as DOM or XOM, and can be expected to use at least as much memory as the sum of all the documents combined together.

Validation and other processes

One of the most common questions about XInclude is how inclusion interacts with validation, XSL transformation, and other processes that may be applied to an XML document. The short answer is that it doesn’t. XInclusion is not part of any other XML process. It is a separate step which you may or may not perform when and where it is useful to you.

For example, consider validation against a schema. A document can be validated before or after inclusion, or both, or neither. If you validate the document before the xi:include elements are replaced, then the schema has to declare the xi:include elements just like it would declare any other element. If you validate the document after the xi:include elements are replaced, then the schema has to declare the replacement elements. You can even write a single schema that covers both cases, by using a choice to permit an element to contain either an xi:include element or its replacement elements.

For another example, consider XSL transformation. XSLT was defined several years before XInclude. The XSLT algorithm operates on well-formed XML documents. An XSLT processor acts on xi:include elements exactly like it acts on any other element; that is, it finds a template rule that matches elements with the local name include in the http://www.w3.org/2001/XInclude namespace and instantiates that rule’s template. It does not automatically replace the xi:include elements. Of course, if you want the xi:include elements to be replaced before the stylesheet is applied, you can first use an XInclude processor to resolve the includes and generate a new XML document, then pass the new document to the XSLT processor along with the stylesheet. You can even resolve the includes, pass the merged document to the XSLT processor for transformation, and then resolve includes again on the output of the transformation in case the stylesheet inserted any new xi:include elements. Inclusion and transformation are separate and orthogonal processes that can be performed in whichever order is convenient in the local environment. There is no canonical processing model for XML.

You cannot simply place include elements in a document and expect them to resolved automatically. There’s always an extra step where you tell some piece of software somewhere to resolve the XIncludes. Depending on the environment this may be a command line flag, an option in a config file, or a separate program you run manually. However assuming you can do that, XInclude is a very useful technique for authoring large documents in multiple, smaller, more manageable pieces.
分享到:
评论

相关推荐

    神经网络简介A Brief Introduction to Neural Networks

    神经网络作为一种重要的机器学习模型,在计算机科学领域内占据了举足轻重的地位。本文将基于给定文件中的信息,深入探讨神经网络的基本概念、发展历程、主要类型及其应用领域,旨在为读者提供一个全面而深入的理解。...

    A Brief Introduction to PySpark

    If you’re already familiar with Python and libraries such as Pandas, then PySpark is a great language to learn in order to create more scalable analyses and pipelines. The goal of this post is to ...

    A brief Introduction to Neural Networks

    文章提到的标题“A brief Introduction to Neural Networks”(神经网络简介),内容提到了神经网络的原理、结构、学习过程等基础知识,以及以Java语言编写的神经网络框架。文中强调了神经网络的学习目的在于为读者...

    a brief introduction to opennp

    OpenMP是一个支持多平台共享内存并行编程的API,它能够在C/C++和Fortran语言中使用。它提供了一组编译制导(编译指导语句)、库函数和环境变量,用以简化多线程的开发。OpenMP以简洁的方式实现了线程的创建、分配...

    A Brief Introduction to Boosting

    ### 提升算法(Boosting)简介 #### 一、引言 提升算法(Boosting)是一种通用的方法,用于提高任何给定学习算法的准确性。它最初源于理论框架下的研究,即所谓的“PAC”(Probably Approximately Correct)学习模型...

    a brief introduction to index theorems

    指标定理的介绍,来自知乎蓝青的论文,图片归集成pdf,便于阅读

    A Brief Introduction to Machine Learning for Engineers

    A Brief Introduction to Machine Learning for Engineers A Brief Introduction to Machine Learning for Engineers

    Think OS: A Brief Introduction to Operating Systems

    《Think OS:操作系统简明入门》是Allen B. Downey所著的一本入门级操作系统教材,主要面向那些对操作系统设计和实现感兴趣的读者。该书不仅涵盖了操作系统的基础知识,还通过实际案例和示例程序,帮助学生和自学者...

    Think OS:操作系统简介Think OS: A Brief Introduction to Operating Systems

    Think OS是面向程序员的操作系统介绍。 计算机架构知识不是先决条件。

    An Introduction to Project Management With a Brief Guide to Microsoft epub

    An Introduction to Project Management With a Brief Guide to Microsoft Project 2013(5th) 英文epub 第5版 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或...

    A Brief Introduction to Machine Learning

    在给定的文件中,作者Osvaldo Simeone在其著作《A Brief Introduction to Machine Learning for Engineers》(工程师的机器学习简明入门)中,为我们提供了一个系统性的机器学习概念介绍,其中包括了监督学习、非...

    A brief introduction to MetaPost

    vardef spiral(expr n, a, b) = if n &gt; 0: a for i = 1 upto n: -- b rotatedaround((0, 0), 90 * i) .. fi enddef; ``` #### 25. 递归路径 递归路径是 MetaPost 中的一种特殊类型的路径,它可以用来构建复杂的...

    A_Brief_Introduction_to_Sigma_Delta_Conversion

    标题与描述:“A Brief Introduction to Sigma Delta Conversion”的深入解析 标题和描述中提及的“Sigma Delta Conversion”(ΣΔ转换)是一种在模拟信号与数字信号之间进行转换的技术,尤其是在低带宽信号转换中...

    A Brief Introduction to Scala

    ### Scala简介:结合函数式与面向对象编程的强大语言 #### Scala概述 Scala是一种相对新兴的编程语言,旨在为Java虚拟机(JVM)以及后来的通用语言运行时(CLR)提供支持。它融合了函数式编程(Functional Programming,...

    人工智能的教育意义简介Brief Introduction to Educational Implications of Artificial Intelligence

    这本简短的书介绍了人类如何利用人工智能来解决问题和完成任务。

    A Brief Introduction to SystemVerilog Instructor.pdf

    《A Brief Introduction to SystemVerilog Instructor.pdf》是一份针对计算机体系结构课程(CSE 502)的教学材料,由Nima Honarmand教授撰写,并在Spring 2015学期使用。该文档主要介绍了SystemVerilog的基础概念、...

    A Brief Introduction to Boosting.pdf

    本文档《A Brief Introduction to Boosting.pdf》主要介绍了提升算法的基本概念、理论基础以及一个具体的提升算法——AdaBoost。 #### 二、提升算法的背景与理论框架 提升算法的起源可以追溯到上世纪80年代末期。...

    A BRIEF INTRODUCTION TO HILBERT SPACE FRAME THEORY AND ITS APPLICATIONS.pdf

    其中\(0 &lt; A \leq B ),并且\(A, B\)被称为框架的下界和上界。 如果等式两侧相等,即\(A = B\)时,这样的框架称为\(A\)-紧框架;特别地,当\(A = B = 1\)时,称其为帕塞瓦尔框架。如果所有框架向量的范数都相同,则...

Global site tag (gtag.js) - Google Analytics