Java String的序列化小结

akingde

浏览: 310865 次
性别:
来自: 北京

最近访客更多访客>>

wuajohn

u012363178

xiaomizhg

痞夫balabala

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

java技术

String对我们来说太熟悉了，因为它无处不在，更因为用String可以描述这个世界几乎所有的东西，甚至于为了描述精确的数值都需要String出马（因为计算机眼中的二进制和人类眼中的十进制间总有那么点隔膜）。因为熟悉而变得简单，也容易被忽略。今天记录一下关于String的容易被忽略的两个问题。

字符串重用——节省内存

因为字符串太多，如果能够重用则能够节省很大的内存。首先看下面一个例子：

String string1 = “HELLOHELLO”;

String string2 = “HELLO” + “HELLO”;

上面创建了几个字符串？1 or 2？后者是动态创建的，不过相信JVM可以对其直接优化的，因为编译时已经知道内容了，猜测是一个instance，即同一个char数组。Heapdump出来后观察果然是一个。

String string3 = args[0]+ args[1];

输入参数HELLO　HELLO？　字符串变成几个？没错啊，是两个HELLOHELLO了。Dump heap后观察，果然是两个了。（其实不用dump healp，debug也可以看出来，string1和string3中的char[]指向地址是不一样的）。

依此延伸，可以而知由java反序列化而来的那些string也是不一样的。实例如下；

public final static void main(String[] args) throws Exception {

new StringDeserialized().testDescirialized();

}

public void testDescirialized() throws Exception {

String testString = “HELLOHELLO”;

ObjectOutputStream dataOutStream = new ObjectOutputStream(new FileOutputStream(“./stringdeserialized.data”));

for (int i = 0; i < 1000; i++)

dataOutStream.writeObject(testString);

dataOutStream.close();

List<String> readAgainList = new ArrayList<String>(100);

for (int i = 0; i < 100; i++) {

ObjectInputStream dataInputStream = new ObjectInputStream(new FileInputStream(“./stringdeserialized.data”));

readAgainList.add((String) dataInputStream.readObject());

dataInputStream.close();

}

Thread.sleep(Integer.MAX_VALUE);

}

截图是heap dump出来的，有HELLOHELLO的个数有101个，占用的size>8080。对于JVM的内存使用可参考 http://www.javamex.com/tutorials/memory/object_memory_usage.shtml

问题来了，系统维护的数据大多是字符串信息，比如configserver，而很多的信息都是同一个字符串，那么反复的从网络序列化而来，占用多的Heap。当然自己可以写一个weak hashmap来维护，重用这些字符串。大家知道JVM中有String Pool，使用它无疑最好不过。查找String源码，发现intern（）的注释如下：

* When the intern method is invoked, if the pool already contains a

* string equal to this <code>String</code> object as determined by

* the {@link #equals(Object)} method, then the string from the pool is

* returned. Otherwise, this <code>String</code> object is added to the

* pool and a reference to this <code>String</code> object is returned.

于是改变上面一行代码为：

readAgainList.add(((String) dataInputStream.readObject()).intern());

再次Heap dump分析如下，另外可以看出一个包含10个字符的String占用的Heap是80byte：

字符串序列化的速度

目前CS处理为了支持所谓的任意类型数据，CS采用了一个技巧，用Swizzle来保存java序列化后的byte类型，Server端无需反序列化就能保存任意类型的data；这样的坏处有两个：通用的Java序列化效率不高；协议不通用，对其他语言支持不行。因为目前的数据信息基本都是String类型，而对对String数据的专门处理，可以通过String内部的byte数组（UTF-8）类表示，这样也便于其他语言解析。可以考虑增加对publish（String）的支持。于是做了如下测试来比较对String不同serialize/deserialize的速率和大小。

结果是writeUTF最小最快，对于100char的String，差距是数量级的相当明显，虽然Swizzle使用了一个技巧，当对同一个swizzle instance多次传输时，无需重复的序列化。

PS:Swizzle简单的说就是把信息包装起来，然后把序列化的byte流缓存起来，这样如果同样的一个信息要推送/发送N次，就无能减少N-1次的序列化时间。

public class CompareSerialization {

public String generateTestData(int stringLength) {

Random random = new Random();

StringBuilder builder = new StringBuilder(stringLength);

for (int j = 0; j < stringLength; j++) {

builder.append((char) random.nextInt(127));

}

return builder.toString();

}

public int testJavaDefault(String data) throws Exception {

ObjectOutputStream outputStream = null;

ObjectInputStream inputStream = null;

try {

ByteArrayOutputStream byteArray = new ByteArrayOutputStream();

outputStream = new ObjectOutputStream(byteArray);

outputStream.writeObject(data);

outputStream.flush();

inputStream = new ObjectInputStream(new ByteArrayInputStream(byteArray.toByteArray()));

inputStream.readObject();

return byteArray.size();

}

finally {

outputStream.close();

inputStream.close();

}

public int testJavaDefaultBytes(String data) throws Exception {

ObjectOutputStream outputStream = null;

ObjectInputStream inputStream = null;

try {

ByteArrayOutputStream byteArray = new ByteArrayOutputStream();

outputStream = new ObjectOutputStream(byteArray);

outputStream.writeBytes(data);

outputStream.flush();

inputStream = new ObjectInputStream(new ByteArrayInputStream(byteArray.toByteArray()));

byte[] bytes = new byte[byteArray.size()];

inputStream.read(new byte[byteArray.size()]);

new String(bytes);

return byteArray.size();

}

finally {

outputStream.close();

inputStream.close();

}

public int testSwizzle(Swizzle data) throws Exception {

ObjectOutputStream outputStream = null;

ObjectInputStream inputStream = null;

try {

ByteArrayOutputStream byteArray = new ByteArrayOutputStream();

outputStream = new ObjectOutputStream(byteArray);

outputStream.writeObject(data);

outputStream.flush();

inputStream = new ObjectInputStream(new ByteArrayInputStream(byteArray.toByteArray()));

inputStream.readObject();

return byteArray.size();

}

finally {

outputStream.close();

inputStream.close();

}

public int testStringUTF(String data) throws Exception {

ObjectOutputStream outputStream = null;

ObjectInputStream inputStream = null;

try {

ByteArrayOutputStream byteArray = new ByteArrayOutputStream();

outputStream = new ObjectOutputStream(byteArray);

outputStream.writeUTF(data);

outputStream.flush();

inputStream = new ObjectInputStream(new ByteArrayInputStream(byteArray.toByteArray()));

inputStream.readUTF();

return byteArray.size();

}

finally {

outputStream.close();

inputStream.close();

}

public final static void main(String[] args) throws Exception {

CompareSerialization compare = new CompareSerialization();

String data = compare.generateTestData(Integer.parseInt(args[0]));

Swizzle swizzle = new Swizzle(data);

System.out.println(“testJavaDefault size on networking:” + compare.testJavaDefault(data));

System.out.println(“testJavaDefaultBytes size on networking:” + compare.testJavaDefaultBytes(data));

System.out.println(“testStringUTF size on networking:” + compare.testStringUTF(data));

System.out.println(“testSwizzle size on networking:” + compare.testSwizzle(swizzle));

// warm up

for (int i = 0; i < 100; i++) {

compare.testJavaDefault(data);

compare.testJavaDefaultBytes(data);

compare.testStringUTF(data);

compare.testSwizzle(swizzle);

}

long startTime = System.currentTimeMillis();

for (int i = 0; i < 10000; i++) {

compare.testJavaDefault(data);

}

long endTime = System.currentTimeMillis();

System.out.println(“testJavaDefault using time:” + (endTime – startTime));

startTime = System.currentTimeMillis();

for (int i = 0; i < 10000; i++) {

compare.testJavaDefaultBytes(data);

}

endTime = System.currentTimeMillis();

System.out.println(“testJavaDefaultBytes using time:” + (endTime – startTime));

startTime = System.currentTimeMillis();

for (int i = 0; i < 10000; i++) {

compare.testStringUTF(data);

}

endTime = System.currentTimeMillis();

System.out.println(“testStringUTF using time:” + (endTime – startTime));

startTime = System.currentTimeMillis();

for (int i = 0; i < 10000; i++) {

compare.testSwizzle(swizzle);

}

endTime = System.currentTimeMillis();

System.out.println(“testSwizzle using time:” + (endTime – startTime));

}

分享到：

使用并监控proxool连接池 | 另类的Singleton模式

2013-04-10 13:49
浏览 874
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论