java pdfbox0.8 UniGB-UCS2-H 问题

shappy1978

浏览: 699864 次
性别:
来自: 广州

最近访客更多访客>>

u012363178

lingzhixue

hsjiang79

bobby318

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

J2EE

Java C C++C#Ant

http://drunkfish.spaces.live.com/blog/cns!FC3E3585A287F598!372.entry

看源码发现有一个包org.apache.pdfbox.encoding.conversion, 这里已经有了各自中文编码的解析程序. 可奇怪的是却没有任何地方调用这些程序. 所以解析pdf文档是依然会报IOException说UniGB-UCS2-H等字体找不到. 看来只有修改源码来解决了. 仔细研究代码后, 发现修改点是程序org.apache.pdfbox.pdmodel.font.PDFont.java

首先增加一个方法以得到字体信息

    public String getEncodingName() {
        COSBase encoding = font.getDictionaryObject(COSName.ENCODING);
        if (encoding != null) {
            if (encoding instanceof COSName) {
                return ((COSName) encoding).getName();
            }
        }
        return null;

}

再修改方法:

public String encode( byte[] c, int offset, int length ) throws IOException

......

//大约420多行, 原代码如下:

        if( retval == null && cmap != null )
        {
                retval = cmap.lookup( c, offset, length );
        }
        //if we havn't found a value yet and
        //we are still on the first byte and
        //there is no cmap or the cmap does not have 2 byte mappings then try to encode
        //using fallback methods.

//修改为:

        if( retval == null && cmap != null )
        {
            String encodingStr = getEncodingName();
            if (encodingStr != null) {
                EncodingConverter converter = EncodingConversionManager.getConverter(encodingStr);
                if (converter != null) {
                    if (length == 1) return null;
                    retval = converter.convertBytes(c, offset, length, cmap);
                } else {
                    retval = cmap.lookup( c, offset, length );
                }
            } else {
                retval = cmap.lookup( c, offset, length );
            }
        }
        //if we havn't found a value yet and
        //we are still on the first byte and
        //there is no cmap or the cmap does not have 2 byte mappings then try to encode
        //using fallback methods.

测试通过, 问题解决

//*********************************************************shappy 评述

这个问题网上找了好久，浪费了一个下午的时间，问的人很多，更多是是混淆视听，列出一些诸如把中文输出到pdf的答案，谢谢上面的作者，ant编译后可用，不过存在两个问题：

1 上述代码遇到非中文字体的中文（比如system），会解析为乱码，因此修改为如下代码：

        	retval = cmap.lookup( c, offset, length );
        	if(retval==null){
	            String encodingStr = getEncodingName();
	            if (encodingStr != null) {
	                EncodingConverter converter = EncodingConversionManager.getConverter(encodingStr);
	                if (converter != null) {
	                    if (length == 1) return null;
	                    retval = converter.convertBytes(c, offset, length, cmap);
	                } else {
	                    retval = cmap.lookup( c, offset, length );
	                }
	            }

2 解析UniGB-UCS2-H字体的文件会在部分文字中间插入一个空格，而解析其他字体不会，这个估计是EncodingConversionManager的bug，没有详细查看。

分享到：

使用 S60 Platform SDKs for Symbian OS, f ... | [转]J2ME开发时小注意点

2009-11-23 18:08
浏览 3401
评论(1)
分类:Web前端
查看更多

1 楼 linustseng 2011-07-25

你好.
這似乎是舊版的修正代碼, 請問有新版的修正代碼嗎?

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论