在用SAX解析XML文档的时候,在XML文件中如果有中文的话就会抛出“invalid byte 1 of 1-byte UTF-8 sequence”异常,调试是总是找不到问题所在,于是求救于网络,终于找到问题所在,成功解决了问题,在此谢谢强大的网络资源。
XML内容实际是以UTF-8编码的,因此造成了包括中文字符的XML文件无法正常阅读,将编码格式改成“GB2312”后就可以正常解析了。<?xml version="1.0" encoding="GB2312"?>
自己的总结:
1、“org.dom4j.DocumentException: Invalid byte 1 of 1-byte UTF-8 sequence.”异常分析和解决:
分析:
该异常由下面的reader.read(file);语句抛出:
SAXReader reader = new SAXReader();
Document doc = reader.read(file);
产生这个异常的原因是:
所读的xml文件实际是GBK或者其他编码的,而xml内容中却用<?xml version="1.0" encoding="utf-8"?>指定编码为utf-8,所以就报异常了!
注释:参考网上的《Java/J2EE中文问题终极解决之道》一文,编码问题原因应该是:操作系统编码为GBK,而xml指定为utf-8,SAXReader使用系统的默认编码GBK,所以存在需要转换编码的问题,也就自然会出现乱码了!解决:让文件编码和java 操作该文件的接口的编码一致;
解决:
情况一:该xml文件由dom4j生成;
解决方法:用 org.dom4j.io.XMLWriter xmlWriter = new org.dom4j.io.XMLWriter(
new FileOutputStream(fileName));
代替
xmlWriter = new XMLWriter(new FileWriter(fileName));
,指定编码为utf-8生成xml文件;
详细参考资料1:
Dom4j 编码问题彻底解决 作者:lonsen
http://www.5inet.net/Develop/Java/036579,Dom4j_BianMaWenDiCheDeJieJue.aspx
情况二:解析从jsp页面中读取到的用户输入的xml描述内容时,reader.read()抛出异常;
解决方法:
调用read前先把xml内容转为utf-8编码:(使用支持编码格式的函数)
public void validate(FacesContext context, UIComponent component, Object obj)
throws ValidatorException {
String xmldescription = (String) obj;
byte[] bytes =xmldescription.getBytes();
RelationXmlParser.isXmlOK("E://jiangcm//templateXMLSchema.xsd",bytes);
……
}
public static boolean isXmlOK(String xsdFile, byte[] tagetXml) throws SAXException, IOException, DocumentException
{
SAXReader reader = new SAXReader();
……
InputStream in = new ByteArrayInputStream(tagetXml);
InputStreamReader utf8In=new InputStreamReader(in,"utf-8");
……
}
自己的解决:String.getBytes("utf-8")
- 浏览: 269797 次
- 性别:
- 来自: 上海
最新评论
-
kyoldj:
select t.*, rownum rn
4. ...
oracle分页所遇到的rownum问题:要增加order by的唯一性 -
tanghuan:
不错不错
开发者如何提升和推销自己
相关推荐
在利用php解析xml时提示Invalid byte 1 of 1-byte UTF-8 sequence错误了,这个问题我百度查实说是编码问题,结果我把编码处理一下果然KO了,下面我来分享一下解决办法
3. **编码问题**:在Windows环境下,使用Ant执行包含UTF-8编码的构建脚本时,可能会遇到`Invalid byte 1 of 1-byte UTF-8 sequence`错误,这通常是因为命令行不支持UTF-8编码。解决办法是将构建脚本改为GBK编码,...
本篇文章将深入探讨一个具体的错误:“invalid byte sequence for encoding \"UTF8\": 0x00”,并提供相应的解决方案。 这个错误发生在尝试将包含空字符(0x00)的数据从SQL Server迁移到PostgreSQL时。在SQL ...
1. **Ruby用户指南**:这是学习Ruby的基础,它将引导你了解Ruby的基本语法、数据类型、控制结构、函数、类和模块等概念。通过这本指南,你可以掌握如何在Ruby中编写简单的程序,并逐渐深入到更复杂的编程技巧。 2. ...
<?xml version="1.0" encoding="utf-8"?> ``` 如果这一行存在格式问题,如额外的字符、缺失的引号或错误的编码,都可能导致解析错误。有时,即使XML语法没有其他明显错误,这行也可能导致问题。如果删除这行可以...
they come from the closely corresponding ISO standard ISO/IEC 10646-1:2000 and currently differ in that they allow codes outside of the Unicode range, which runs from 0x0 to 0x10FFFF.) Pattern ...
Disassembly of raw data buffers with byte initialization data now prefixes each output line with the current buffer offset. Disassembly of ASF! table now includes all variable-length data fields at ...
--enable-sep, --enable-aes, --enable-1g-pages are deprecated and should not be used anymore. - Local APIC configure option --enable-apic is deprecated and should not be used anymore. The LAPIC ...
byte-wise writes of CSRs such as the deviceID register and BAR. - Message response transaction received as a user defined packet type using 16-bit device IDs appears as a corrupted packet on the ...
聊天记录开膛手在 WDI 中,我们共享一切。... 如果您收到错误“in `scan': invalid byte sequence in UTF-8 (ArgumentError)”,只需将您的文本日志解析为可以转换为 UTF-8 的内容(例如 )。 我将来会解决这个问题。
**DOM4J DocumentException: Invalid byte 2 of 2-byte UTF-8 sequence** **异常描述:** 当Hibernate尝试解析一个XML配置文件时,如果文件中的某些字符不符合UTF-8编码规则,就会抛出此类异常。 **解决方法:** ...
You are visitor as of October 17, 1996. The Art of Assembly Language Programming <br>Forward Why Would Anyone Learn This Stuff? 1 What's Wrong With Assembly Language 2 What's Right With ...
ADC12, Repeated Sequence of Conversions ADC12, Repeated Single Channel Conversions ADC12, Using 10 External Channels for Conversion ADC12, Sequence of Conversions (non-repeated) ADC12, Sample A10 Temp...
Doctest:测试交互式Haskell示例doctest是一个小程序,用于检查。 它与相似,。安装可以从获得doctest 。 通过键入以下内容进行安装: cabal install doctest确保Cabal的bindir在您的PATH 。 在Linux上: export ...
* added support for MKV "SRT/UTF8", "SRT/ASCII", "ASS" and "SSA" subtitles * increased some internal buffers to avoid AC3 overflow in the "thd ac3 joiner" * fixed: frame counting didn't work for MKV ...
- Support for S7-200 with CP 243-1 was added. Solved problems: - Passing of invalid OPC Item IDs caused a memory leak of the driver's global memory. After the global memory was exhausted, the ...
Byte 1: Resource Type – 0x07 (Key) Byte 2-3: DBID – 0x0005 Byte 4-7: ObjectID – 0x 75D7831F (1977058079) Byte 8-9: IndexID – 0x0001 Byte 10-16: Hash Key value – 0x 02014F0BEC4E For more ...