An invalid XML character

sjsky

浏览: 929563 次
性别:
来自: 上海

最近访客更多访客>>

joadge1983

ouaijsun

码猿工

u010469169

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Exception

XML Exception 无效字符

blog迁移至:http://www.micmiu.com

我们在解析XML文件时，会碰到程序发生以下一些异常信息：

引用

org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x{2}) was found in the value of attribute "{1}" and element is "1f".

引用

An invalid XML character (Unicode: 0x1d) was found in the CDATA section.

这些错误的发生是由于一些不可见的特殊字符的存在，而这些字符对于XMl文件来说又是非法的，所以XML解析器在解析时会发生异常，官方定义了XML的无效字符分为三段：

0x00 - 0x08
0x0b - 0x0c
0x0e - 0x1f

解决方法是：在解析之前先把字符串中的这些非法字符过滤掉：

string.replaceAll("[\\x00-\\x08\\x0b-\\x0c\\x0e-\\x1f]", "")

测试代码：TestXmlInvalidChar.java

package michael.xml;

import java.io.ByteArrayInputStream;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;

/**
 * @author michael
 * 
 */
public class TestXmlInvalidChar {

    /**
     * @param args
     */
    public static void main(String[] args) {

        // 测试的字符串应该为：<r><c d="s" n="j"></c></r>
        // 正常的对应的byte数组为
        byte[] ba1 = new byte[] { 60, 114, 62, 60, 99, 32, 100, 61, 34, 115,
                34, 32, 110, 61, 34, 106, 34, 62, 60, 47, 99, 62, 60, 47, 114,
                62 };
        System.out.println("ba1     length=" + ba1.length);
        String ba1str = new String(ba1);
        System.out.println(ba1str);
        System.out.println("ba1str  length=" + ba1str.length());
        System.out.println("-----------------------------------------");
        // 和正常的byte 数组相比 多了一个不可见的 31
        byte[] ba2 = new byte[] { 60, 114, 62, 60, 99, 32, 100, 61, 34, 115,
                34, 32, 110, 61, 34, 106, 31, 34, 62, 60, 47, 99, 62, 60, 47,
                114, 62 };
        System.out.println("ba2     length=" + ba2.length);
        String ba2str = new String(ba2);
        System.out.println(ba2str);
        System.out.println("ba2str  length=" + ba2str.length());
        System.out.println("-----------------------------------------");
        try {
            DocumentBuilderFactory dbfactory = DocumentBuilderFactory
                    .newInstance();
            dbfactory.setIgnoringComments(true);
            DocumentBuilder docBuilder = dbfactory.newDocumentBuilder();

            // 过滤掉非法不可见字符 如果不过滤 XML解析就报异常
            String filter = ba2str.replaceAll(
                    "[\\x00-\\x08\\x0b-\\x0c\\x0e-\\x1f]", "");
            System.out.println("过滤后的length=" + filter.length());
            ByteArrayInputStream bais = new ByteArrayInputStream(filter
                    .getBytes());
            Document doc = docBuilder.parse(bais);
            Element rootEl = doc.getDocumentElement();
            System.out.println("过滤后解析正常 root child length="
                    + rootEl.getChildNodes().getLength());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

测试代码运行结果如下：

引用

ba1 length=26
<r><c d="s" n="j"></c></r>
ba1str length=26
-----------------------------------------
ba2 length=27
<r><c d="s" n="j"></c></r>
ba2str length=27
-----------------------------------------
过滤后的length=26
过滤后解析正常 root child length=1

对比可见，byte数组及字符串的长度前后是不一样的，但打印到控制台显示的结果却是一样的。同样过滤之后的字符串长度是有变化的。

-----------------------------------分 ------------------------------------隔 ------------------------------------线 --------------------------------------

分享到：

linux下USB转串口的设置 | 用JAVA实现MSN Messenger的功能

2011-05-23 12:09
浏览 9911
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论