关于ZipInputStream乱码问题 -

yangxiutian

浏览: 62101 次
性别:

最近访客更多访客>>

scy3964

u010214413

50050192

change37

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

关于ZipInputStream乱码问题

博客分类：

解读jdk

ZipInputStream 乱码

    private static String getUTF8String(byte[] b, int off, int len) {
	// First, count the number of characters in the sequence
	int count = 0;
	int max = off + len;
	int i = off;
	while (i < max) {
	    int c = b[i++] & 0xff;
	    switch (c >> 4) {
	    case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
		// 0xxxxxxx
		count++;
		break;
	    case 12: case 13:
		// 110xxxxx 10xxxxxx
		if ((int)(b[i++] & 0xc0) != 0x80) {
		    throw new IllegalArgumentException();
		}
		count++;
		break;
	    case 14:
		// 1110xxxx 10xxxxxx 10xxxxxx
		if (((int)(b[i++] & 0xc0) != 0x80) ||
		    ((int)(b[i++] & 0xc0) != 0x80)) {
		    throw new IllegalArgumentException();
		}
		count++;
		break;
	    default:
		// 10xxxxxx, 1111xxxx
		throw new IllegalArgumentException();
	    }
	}
	if (i != max) {
	    throw new IllegalArgumentException();
	}
	// Now decode the characters...
	char[] cs = new char[count];
	i = 0;
	while (off < max) {
	    int c = b[off++] & 0xff;
	    switch (c >> 4) {
	    case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
		// 0xxxxxxx
		cs[i++] = (char)c;
		break;
	    case 12: case 13:
		// 110xxxxx 10xxxxxx
		cs[i++] = (char)(((c & 0x1f) << 6) | (b[off++] & 0x3f));
		break;
	    case 14:
		// 1110xxxx 10xxxxxx 10xxxxxx
		int t = (b[off++] & 0x3f) << 6;
		cs[i++] = (char)(((c & 0x0f) << 12) | t | (b[off++] & 0x3f));
		break;
	    default:
		// 10xxxxxx, 1111xxxx
		throw new IllegalArgumentException();
	    }
	}
	return new String(cs, 0, count);
    }

以上是ZipInputStream读取有中文条目的zip包时会抛出异常的方法，该方法有两个while循环，第一个是统计条目名字符数count，第二个是用UTF-8算法解码byte数组，提取到字符串一切不是UTF-8的字符串直接抛异常，由于中文系统默认编码是GBK 所以条目名含有中文时直接抛异常。

不知道编码是提倡用循环统计还是“变长数组”，该方法可以改成如下：

private static String getUTF8String(byte[] b, int off, int len) {CharArrayWriter caw=new CharArrayWriter();
		int i = 0;
		while (off < max) {
			int c = b[off++] & 0xff;
			switch (c >> 4) {
			case 0:
			case 1:
			case 2:
			case 3:
			case 4:
			case 5:
			case 6:
			case 7:
				// 0xxxxxxx 
				caw.append((char) c); 
				break;
			case 12:
			case 13:
				// 110xxxxx 10xxxxxx
//				 
				caw.append((char) (((c & 0x1f) << 6) | (b[off++] & 0x3f))); 
				break;
			case 14:
				// 1110xxxx 10xxxxxx 10xxxxxx
				int t = (b[off++] & 0x3f) << 6;
//				 
				caw.append((char) (((c & 0x0f) << 12) | t | (b[off++] & 0x3f))); 
				break;
			default:
				// 10xxxxxx, 1111xxxx
				throw new IllegalArgumentException();
			}
		}
		char[] ch=caw.toCharArray();
		return new String(ch, 0, ch.length);
	}

也就是把统计字符数改成CharArrayWriter类，利用它的“数组扩展特性”，该类默认分配初始数组为32大小，个人认为是比较合理的。不知各位比较支持哪种编程方法。

其次我们再来讨论乱码的问题。其实String类里面就封装了byte解码的功能了，这里不知道作者为什么要自己实现解码，还不负责任的抛异常 难道是想表现一下自己的技术？用字符串的解码 可以改成如下：

private static String getUTF8String(byte[] paramArrayOfByte, int paramInt1, int paramInt2)
  { 
    try{
      return new String(paramArrayOfByte, paramInt1, paramInt2,"UTF-8"); 
    }catch(UnsupportedEncodingException e){
      e.printStackTrace();
      return "encoding error！";
    }
  }

怎么样，够简单吧。

为了能够让用户自行设置编码，可以增加一个成员变量，增加一个方法 于是改成这样：

   private static String filenameEncoding=Charset.defaultCharset().toString();//默认使用系统默认编码
   public static void setFilenameEncoding(String encoding){
    filenameEncoding=encoding;
  }

private static String getUTF8String(byte[] b, int off, int len) {
  { 
    try{
      return new String(paramArrayOfByte, paramInt1, paramInt2,filenameEncoding); 
    }catch(UnsupportedEncodingException e){
      e.printStackTrace();
      return "encoding error！";
    }
  }

这样就“完美”了。

后记：以上代码楼主亲自测过功能没有问题 不过不知道为什么SUN不写成这样，写此博文旨在跟大家交流，楼主的语文学得不好 尽请见谅 希望大家能看懂意思并发表意见就好了。

分享到：

提供反向播放音频的类 | oracle笔记_系统特权

2012-09-10 11:02
浏览 3591
评论(0)
论坛回复 / 浏览 (0 / 2532)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

关于ZipInputStream乱码问题

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

关于ZipInputStream乱码问题

评论

发表评论

相关推荐

最近访客更多访客>>