java 编码，乱码，字符系列（2） -

alan0509

浏览: 10885 次
性别:
来自: 深圳

最近访客更多访客>>

18914163308

liuluwdy

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

java 编码，乱码，字符系列（2）

Java 咨询 JVM IDE Blog

java编码方式及处理方法介绍

按照步骤运行下面代码：

System.out.println(Charset.defaultCharset());

所输出结果均已本地测试为准。不同操作系统可能输出结果不同

执行上述代码：GBK 表明系统默认的编码方式为GBK

打开记事本，新建一个文件。取名为 gbk.txt;之所以取名为gbk，是因为该文件是按照系统默认的编码方式编码，及GBK，文件内容为：挨踢good

执行下述代码：

// gbk.txt use the default character encoding
		InputStream is1 = new FileInputStream("gbk.txt");
		InputStreamReader streamReader1 = new InputStreamReader(is1);
		char[] chars1 = new char[is1.available()];
		streamReader1.read(chars1, 0, is1.available());
		System.out.println(new String(chars1));

输出结果：挨踢 bad

执行下述代码：

System.out.println("------------utf----------------");
		InputStream is = new FileInputStream("gbk.txt");
		InputStreamReader streamReader = new InputStreamReader(is, "utf-8");
		System.out.println("change encoding " + streamReader.getEncoding());
		char[] chars = new char[is.available()];
		streamReader.read(chars, 0, is.available());
		System.out.println(new String(chars));

输出结果：------------utf----------------
change encoding UTF8
????good

解释：在此处，当我们使用utf-8 编码方式读取文件gbk.txt时，文件出现乱码。是因为我们是用系统默认的“gbk”编码方式创建的文件。当用utf-8读取时，文件肯定出错。

2，为什么 good 输出没有乱码？这与具体的编码方式有关，对于英文字母的处理，gbk 和utf-8采取的编码方式可以看做是没有出入的。具体可参http://alan0509.iteye.com/blog/729929

执行下述代码：

创建文件，utf8.txt，很明显，采用的编码方式是utf-8。文件内容为：上海世大咨询（在此，我没有加入英文。）

执行下面代码：

		System.out.println("------------utf----------------");
		InputStream is2 = new FileInputStream("utf8.txt");
		InputStreamReader streamReader2 = new InputStreamReader(is2);
		System.out.println("use gbk to read utf-8 "
				+ streamReader2.getEncoding());
		char[] chars2 = new char[is2.available()];
		streamReader2.read(chars2, 0, is2.available());
		System.out.println(new String(chars2) + chars2.length);

输出结果：

------------utf----------------
use gbk to read utf-8 GBK
锘夸笂娴蜂笘澶у挩璇 21

可见，当我们读取一个文件时，如果不指定编码方式，将自动采用“系统默认的方式读取”。至此，我们发现第一个乱码问题：当我们读取文件时，有必要知道文件的编码方式。

执行下面代码：

System.out.println("-----------use byte-----------------");
		File file2 = new File("utf8.txt");
		InputStream inputStream = new FileInputStream(file2);
		byte[] bs = new byte[(int) file2.length()];
		inputStream.read(bs);

		String s = new String(bs, "utf-16");
		System.out.print("use byte to read utf-8，and use utf-16 decode"
				+ "use utf-16  " + s);
		System.out.println();

		String s2 = new String(bs, "utf-8");
		System.out.print("use byte read utf-8，and utf-8 decode" + "use utf-8  "
				+ s2);
		System.out.println();

输出结果：

-----------use byte-----------------
use byte to read utf-8，and use utf-16 decodeuse utf-16 ??????銨??
use byte read utf-8，and utf-8 decodeuse utf-8 ?上海世大咨询

结果解释：此处，我采取的方式是将文件中的内容以字节方式读取，然后针对字节进行编码。当我采用utf-16编码时并不能还原文件。当采用utf-8编码时能还原文件。但此处多出一个问号。具体原因将在后面文章中介绍。

执行下面代码：

System.out.println("------------utf----------------");
		File file = new File("utf8.txt");
		InputStream is3 = new FileInputStream(file);
		InputStreamReader streamReader3 = new InputStreamReader(is3, "utf-8");
		System.out.println(streamReader3.getEncoding());
		char[] chars3 = new char[is3.available()];
		streamReader3.read(chars3, 0, is3.available());
		System.out.println(new String(chars3));

输出结果：

------------utf----------------
UTF8
?上海世大咨询注：问号问题将在后面介绍。

执行下面代码：

System.out.println("------------------------");
		String sp = "上海世大";
		// use default encodeing to get bytes
		byte[] bs2 = sp.getBytes();
		byte[] bs3 = sp.getBytes("utf-8");
		String string = new String(bs3, "utf-8");
		String string2=new String(bs2,Charset.defaultCharset());
		String string3 = new String(bs2, "utf-16");
		System.out
				.println("use default character encodeing，and then use utf-8 to decode    "
						+ string);
		System.out
		.println("use default character encodeing，and then use gbk to decode    "
				+ string2);
		System.out
		.println("use default character encodeing，and then use utf-16 to decode    "
				+ string3);

输出结果：

------------------------
use default character encodeing，and then use utf-8 to decode    上海世大
use default character encodeing，and then use gbk to decode    上海世大
use default character encodeing，and then use utf-16 to decode    ????

可见当我们使用getbytes方法时，指定具体编码方式是有好处的。

对java处理编码的一点补充：

Java的class文件采用utf8的编码方式，JVM运行时采用utf16。（注意是java的class文件。.java文件并不是class文件，我们写的java文件在没有指定编码的情况下，将根据IDE工具指定的编码方式。一般情况下会采用系统默认的编码方式，所以我们的java文件拿到不同的机器上运行时可能出现乱码，具体原因上面文章有介绍。）
请看我的第三篇文章。

sourcce.rar (144 Bytes)
下载次数: 0

分享到：