展示字符集编码表示

wen66

浏览: 512074 次
性别:
来自: 深圳

最近访客更多访客>>

liuqu11

cxy_boot

dddpeter

brucelovejava

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

代码片断

import java.nio.ByteBuffer;
import java.nio.charset.Charset;

/**
 * Charset encoding test.  Run the same input string, which contains
 * some non-ascii characters, through several Charset encoders and dump out
 * the hex values of the resulting byte sequences.
 */
public class DecodeTest {
    public static void main(String[] args) {
        // This is the character sequence to encode 
        String input = "\u00bfMa\u00f1ana?";
        String [] charsetNames = {
                "US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16BE",
                "UTF-16LE", "UTF-16"
        };
        for (int i = 0; i < charsetNames.length; i++) {
            doEncode (Charset.forName(charsetNames[i]), input);
        }
    }

    private static void doEncode(Charset cs, String input) {
        ByteBuffer bb = cs.encode(input);
        System.out.println("Charset: " + cs.name());
        System.out.println("  input :" + input);
        System.out.println("Encoded: " );
        for (int i = 0; bb.hasRemaining(); i++) {
            int b = bb.get();
            int ival = ((int) b) & 0xff;
            char c = (char) ival;
            // Keep tabular alignment pretty
            if (i < 10) System.out.print(" ");
            // Print index number
            System.out.print("  " + i + ": ");
            // Better formatted output is coming someday...
            if (ival < 16) System.out.print("0");
            // Print the hex value of  the byte
            System.out.print(Integer.toHexString(ival));
            // If the byte seems to be the value of a
            // printable character, print it.  No guarantee
            // it will be.
            if (Character.isWhitespace(c) || Character.isISOControl(c)) {
                System.out.println("");
            } else {
                System.out.println(" (" + c + ")");
            }
        }
        System.out.println("");
    }
}

输出结果

Charset: US-ASCII
  input :¿Mañana?
Encoded: 
   0: 3f (?)
   1: 4d (M)
   2: 61 (a)
   3: 3f (?)
   4: 61 (a)
   5: 6e (n)
   6: 61 (a)
   7: 3f (?)

Charset: ISO-8859-1
  input :¿Mañana?
Encoded: 
   0: bf (¿)
   1: 4d (M)
   2: 61 (a)
   3: f1 (ñ)
   4: 61 (a)
   5: 6e (n)
   6: 61 (a)
   7: 3f (?)

Charset: UTF-8
  input :¿Mañana?
Encoded: 
   0: c2 (Â)
   1: bf (¿)
   2: 4d (M)
   3: 61 (a)
   4: c3 (Ã)
   5: b1 (±)
   6: 61 (a)
   7: 6e (n)
   8: 61 (a)
   9: 3f (?)

Charset: UTF-16BE
  input :¿Mañana?
Encoded: 
   0: 00
   1: bf (¿)
   2: 00
   3: 4d (M)
   4: 00
   5: 61 (a)
   6: 00
   7: f1 (ñ)
   8: 00
   9: 61 (a)
  10: 00
  11: 6e (n)
  12: 00
  13: 61 (a)
  14: 00
  15: 3f (?)

Charset: UTF-16LE
  input :¿Mañana?
Encoded: 
   0: bf (¿)
   1: 00
   2: 4d (M)
   3: 00
   4: 61 (a)
   5: 00
   6: f1 (ñ)
   7: 00
   8: 61 (a)
   9: 00
  10: 6e (n)
  11: 00
  12: 61 (a)
  13: 00
  14: 3f (?)
  15: 00

Charset: UTF-16
  input :¿Mañana?
Encoded: 
   0: fe (þ)
   1: ff (ÿ)
   2: 00
   3: bf (¿)
   4: 00
   5: 4d (M)
   6: 00
   7: 61 (a)
   8: 00
   9: f1 (ñ)
  10: 00
  11: 61 (a)
  12: 00
  13: 6e (n)
  14: 00
  15: 61 (a)
  16: 00
  17: 3f (?)

UTF -16BE 和UTF -16LE把每个字符编码为一个 2-字节数值。因此这类编码的解码器必须
要预先了解数据是如何编码的，或者根据编码数据流本身来确定字节顺序的方式。UTF -16
编码承认一种字节顺序标记：Unicode字符\uFEFF 。只有发生在编码流的开端时字节顺序
标记才表现为其特殊含义。如果之后遇到该值，它是根据其定义的 Unicode 值（零宽度，
无间断空格）被映射。外来的，小字节序系统可能会优先考虑\ uFEF 并且把流编码为
UTF -16LE。使用UTF -16编码优先考虑和认可字节顺序标记使系统带有不同的内部字节顺
序，从而与 Unicode数据交流

UTF-16BE	无字节标记，编码高位字序
UTF-16LE	无字节标记，编码低位字序

更多信息请参考: orelly出版的 java nio 第6章.

分享到：

LinkedHashmap的构建函数的第三个参数引发 ... | 20120227

2012-03-02 13:43
浏览 1245
评论(0)
分类:非技术
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

展示字符集编码表示

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

展示字符集编码表示

评论

发表评论

相关推荐

jackson知识点

ChartDirectorvk如何测试文本的长度跟宽度

消息系统部署、维护文档 (HornetQ)

netty与tomcat等nio的比较(取自zhh2009在论坛里的发言)

LinkedHashmap的构建函数的第三个参数引发的问题

nginx 预压缩(gzip)静态文件

Android ListView pull up to refresh 改造(转)

Android中dp和px之间进行转换

使用getIdentifier()获取资源Id

项目中用到的一个小工具类(字符过滤器)

下载处理Servlet工具类

Android缩略图类源代码

ie和firefox使用自定义属性

spring里的工具类,摘自spring3.0.5版本

简单实现${}模板替换功能

最近访客更多访客>>