论坛首页 Java企业应用论坛

读取xml文件时不做validation的方法

浏览 3998 次
精华帖 (0) :: 良好帖 (1) :: 新手帖 (6) :: 隐藏帖 (0)
作者 正文
   发表时间:2010-06-21  
今天遇到一个问题,我使用dom4j读取一个xml文件的内容,该xml文件中指定了一个dtd文件,而我并没有这个dtd文
件,在我用SAXReader读取xml文件时,便报如下错误:

java.io.FileNotFoundException: [dtd文件名] (The system cannot find the file specified)
	at java.io.FileInputStream.open(Native Method)
	at java.io.FileInputStream.<init>(FileInputStream.java:106)
	at java.io.FileInputStream.<init>(FileInputStream.java:66)
	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1315)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1282)
	at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:283)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1192)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1089)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1002)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
	at org.dom4j.io.SAXReader.read(SAXReader.java:465)
	at org.dom4j.io.SAXReader.read(SAXReader.java:321)


看来,是xerces自动进行了语法检查,其实这个xml文件是合法的,我这是想读取其中一些数据,并不想做validation,而
且我也不能删除xml中的对dtd的引用,我想,只要关闭默认的语法检查就可以了。查了查dom4j的文档,我把能想到的设置
都关闭了:

	reader.setValidation(false);
	reader.setIncludeInternalDTDDeclarations(false);
	reader.setIncludeExternalDTDDeclarations(false);
	reader.setFeature("http://apache.org/xml/features/validation", false);


可是,仍然不行,没办法,只好跟踪源码了,最后在xerces的XMLDocumentScannerImpl里发现了这段代码:

	if (((fValidation || fLoadExternalDTD) 
		&& (fValidationManager == null || !fValidationManager.isCachedDTD()))) {
		// This handles the case of a DOCTYPE that had neither an internal subset or an external subset.
		fDTDScanner.setInputSource(fExternalSubsetSource);
		fExternalSubsetSource = null;
		if (!fDisallowDoctype)
			setScannerState(SCANNER_STATE_DTD_EXTERNAL_DECLS);
		else
			setScannerState(SCANNER_STATE_PROLOG);
		setDriver(fContentDriver);
		if(fDTDDriver == null)
			fDTDDriver = new DTDDriver();
		return fDTDDriver.next();
	}


需要把fValidation和fLoadExternalDTD都设成false才行,继续研究代码,最后终于找到了解决办法:
	SAXReader reader = new SAXReader(false);
	reader.setValidation(false);
	reader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
	Document document = reader.read(xmlFile);


其中setValidation(false);可以将fValidation设置成false,setFeature("http://apache.org/xml/features/
nonvalidating/load-external-dtd", false);将fLoadExternalDTD设置成false。

希望对其他遇到同样问题的人有些帮助。
   发表时间:2010-06-21  
SAXBuilder builder = new SAXBuilder();
            builder.setValidation(false);
            builder.setEntityResolver(new EntityResolver() {

public InputSource resolveEntity(String publicId,
                                                 String systemId) {
                    return new InputSource(new StringReader(""));
                }
            }
            );
            Document document = builder.build(new File(xmlName));
0 请登录后投票
   发表时间:2010-06-21  
哦,这样也是可以的呀,多谢赐教。
不过感觉比我的方法复杂了一点,呵呵。
0 请登录后投票
   发表时间:2010-06-22  
谢谢 这还是很有用的
0 请登录后投票
   发表时间:2010-08-09  
还有一种解决办法:
        //加上这一句,可以兼容dtd文件不存在的错误
        reader.setEntityResolver(new MyEntityResolver());


public class MyEntityResolver implements EntityResolver
{
    public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException
{
    return new InputSource(new ByteArrayInputStream(
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>".getBytes()));
}
0 请登录后投票
   发表时间:2010-08-10  
多谢楼主分享,以前还真没碰到这种情况,不过以后碰到就知道如何处理了
0 请登录后投票
   发表时间:2011-01-13  
再做一点补充

如果是使用JDK的DocuemntBuilder来读取xml时,可以这样:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder builder = factory.newDocumentBuilder();
doc = builder.parse(file);
0 请登录后投票
论坛首页 Java企业应用版

跳转论坛:
Global site tag (gtag.js) - Google Analytics