关于ZipInputStream乱码问题

全部 Ruby Python PHP Flash C++ .net Rails Flex C C# Django

浏览 2503 次

锁定老帖子主题：关于ZipInputStream乱码问题精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
作者	正文
yangxiutian 等级: 初级会员性别: 文章: 10 积分: 40	发表时间：2012-09-10 相关推荐: java 支持中文解压：一行代码解决 java.util.zip.ZipInputStream 中文乱码；--文件处理通用类 java inputstreamreader乱码_解决java ZipInputStream 读取中文内容的乱码关于使用java ZipOutputStream中文乱码问题解决ZipArchiveInputStream解压后乱码问题 ICSharpCode.SharpZipLib 中文乱码问题更多相关推荐乱码 ZipInputStream private static String getUTF8String(byte[] b, int off, int len) { // First, count the number of characters in the sequence int count = 0; int max = off + len; int i = off; while (i < max) { int c = b[i++] & 0xff; switch (c >> 4) { case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: // 0xxxxxxx count++; break; case 12: case 13: // 110xxxxx 10xxxxxx if ((int)(b[i++] & 0xc0) != 0x80) { throw new IllegalArgumentException(); } count++; break; case 14: // 1110xxxx 10xxxxxx 10xxxxxx if (((int)(b[i++] & 0xc0) != 0x80) \|\| ((int)(b[i++] & 0xc0) != 0x80)) { throw new IllegalArgumentException(); } count++; break; default: // 10xxxxxx, 1111xxxx throw new IllegalArgumentException(); } } if (i != max) { throw new IllegalArgumentException(); } // Now decode the characters... char[] cs = new char[count]; i = 0; while (off < max) { int c = b[off++] & 0xff; switch (c >> 4) { case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: // 0xxxxxxx cs[i++] = (char)c; break; case 12: case 13: // 110xxxxx 10xxxxxx cs[i++] = (char)(((c & 0x1f) << 6) \| (b[off++] & 0x3f)); break; case 14: // 1110xxxx 10xxxxxx 10xxxxxx int t = (b[off++] & 0x3f) << 6; cs[i++] = (char)(((c & 0x0f) << 12) \| t \| (b[off++] & 0x3f)); break; default: // 10xxxxxx, 1111xxxx throw new IllegalArgumentException(); } } return new String(cs, 0, count); } 以上是ZipInputStream读取有中文条目的zip包时会抛出异常的方法，该方法有两个while循环，第一个是统计条目名字符数count，第二个是用UTF-8算法解码byte数组，提取到字符串一切不是UTF-8的字符串直接抛异常，由于中文系统默认编码是GBK 所以条目名含有中文时直接抛异常。不知道编码是提倡用循环统计还是“变长数组”，该方法可以改成如下： private static String getUTF8String(byte[] b, int off, int len) {CharArrayWriter caw=new CharArrayWriter(); int i = 0; while (off < max) { int c = b[off++] & 0xff; switch (c >> 4) { case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: // 0xxxxxxx caw.append((char) c); break; case 12: case 13: // 110xxxxx 10xxxxxx // caw.append((char) (((c & 0x1f) << 6) \| (b[off++] & 0x3f))); break; case 14: // 1110xxxx 10xxxxxx 10xxxxxx int t = (b[off++] & 0x3f) << 6; // caw.append((char) (((c & 0x0f) << 12) \| t \| (b[off++] & 0x3f))); break; default: // 10xxxxxx, 1111xxxx throw new IllegalArgumentException(); } } char[] ch=caw.toCharArray(); return new String(ch, 0, ch.length); } 也就是把统计字符数改成CharArrayWriter类，利用它的“数组扩展特性”，该类默认分配初始数组为32大小，个人认为是比较合理的。不知各位比较支持哪种编程方法。其次我们再来讨论乱码的问题。其实String类里面就封装了byte解码的功能了，这里不知道作者为什么要自己实现解码，还不负责任的抛异常难道是想表现一下自己的技术？用字符串的解码可以改成如下： private static String getUTF8String(byte[] paramArrayOfByte, int paramInt1, int paramInt2) { try{ return new String(paramArrayOfByte, paramInt1, paramInt2,"UTF-8"); }catch(UnsupportedEncodingException e){ e.printStackTrace(); return "encoding error！"; } } 怎么样，够简单吧。为了能够让用户自行设置编码，可以增加一个成员变量，增加一个方法于是改成这样： private static String filenameEncoding=Charset.defaultCharset().toString();//默认使用系统默认编码 public static void setFilenameEncoding(String encoding){ filenameEncoding=encoding; } private static String getUTF8String(byte[] b, int off, int len) { { try{ return new String(paramArrayOfByte, paramInt1, paramInt2,filenameEncoding); }catch(UnsupportedEncodingException e){ e.printStackTrace(); return "encoding error！"; } } 这样就“完美”了。后记：以上代码楼主亲自侧过功能没有问题不过不知道为什么SUN不写成这样，写此博文旨在跟大家交流，楼主的语文学得不好尽请见谅希望大家能看懂意思并发表意见就好了。声明：ITeye文章版权属于作者，受法律保护。没有作者书面许可不得转载。推荐链接
返回顶楼

论坛首页 → 编程语言技术版

跳转论坛: