PreparedStatement中setString方法的异常(转载)
使用ojdbc14驱动
当执行插入语句到
stmt.setString(2, myString);
的时候,出现如下异常:
java.sql.SQLException: 数据大小超出此类型的最大值
而myString变量的内容为超过700的中文字符(测试的英文字符为2000个)。
也就是说一个中文字符占据了3-4个字节,而且英文字符个数也不对,这好像有点不可思议。
分析原因:
1、驱动程序在把SQL语句发给数据库前,PreparedStatement会对字符串进行预处理并进行转义替换;
2、字符集原因。
通过阅读PreparedStatement文档,发现有一个setCharacterStream方法可以解决这个问题:
stmt.setCharacterStream(2,new InputStreamReader(myString, myString.length());
替换之后中文字符可达1400多。
上次对PreparedStatement的setString中字符串长度这个问题没有解决透彻,
也没有深入分析其中的原因。
现在通过Oracle提供的JDBC文档来详细看看问题的来由。
我们都知道Oracle提供了两种客户端访问方式OCI和thin,
在这两种方式下,字符串转换的过程如下:
1、JDBC OCI driver:
在JDBC文档中是这么说的:
“
If the value of NLS_LANG is set to a character set other than US7ASCII or WE8ISO8859P1, then the driver uses UTF8 as the client character set. This happens automatically and does not require any user intervention. OCI then converts the data from the database character set to UTF8. The JDBC OCI driver then passes the UTF8 data to the JDBC Class Library where the UTF8 data is converted to UTF-16. ”
2、JDBC thin driver:
JDBC文档是这样的:
“If the database character set is neither ASCII (US7ASCII) nor ISO Latin1 (WE8ISO8859P1), then the JDBC thin driver must impose size restrictions for SQL CHAR bind parameters that are more restrictive than normal database size limitations. This is necessary to allow for data expansion during conversion.
The JDBC thin driver checks SQL CHAR bind sizes when a setXXX() method (except for the setCharacterStream() method) is called. If the data size exceeds the size restriction, then the driver returns a SQL exception (SQLException: Data size bigger than max size for this type) from the setXXX() call. This limitation is necessary to avoid the chance of data corruption when conversion of character data occurs and increases the length of the data. This limitation is enforced in the following situations:
(1)Using the JDBC thin driver
(2)Using binds (not defines)
(3)Using SQL CHAR datatypes
(4)Connecting to a database whose character set is neither ASCII (US7ASCII) nor ISO Latin1 (WE8ISO8859P1)
When the database character set is neither US7ASCII nor WE8ISO8859P1, the JDBC thin driver converts Java UTF-16 characters to UTF-8 encoding bytes for SQL CHAR binds. The UTF-8 encoding bytes are then transferred to the database, and the database converts the UTF-8 encoding bytes to the database character set encoding.”
原来是JDBC在转换过程中对字符串的长度做了限制。这个限制和数据库中字段的实际长度没有关系。
所以,setCharacterStream()方法可以逃过字符转换限制,也就成为了解决此问题的方案之一。
而JDBC对转换字符长度的限制是为了转换过程中的数据扩展。
根据实际测试结果,在ZHS16GBK字符集和thin驱动下,2000-4000长度的varchar字段都只能插入1333个字节(约666个汉字)。
备注:换最新的Oracle10g驱动,可能会播入更多,驱动问题。