`
shameant
  • 浏览: 59042 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

The Java serialization algorithm revealed

阅读更多

Serialization is the process of saving an object's state to a sequence of bytes; deserialization is the process of rebuilding those bytes into a live object. The Java Serialization API provides a standard mechanism for developers to handle object serialization. In this tip, you will see how to serialize an object, and why serialization is sometimes necessary. You'll learn about the serialization algorithm used in Java, and see an example that illustrates the serialized format of an object. By the time you're done, you should have a solid knowledge of how the serialization algorithm works and what entities are serialized as part of the object at a low level.

Why is serialization required?

In today's world, a typical enterprise application will have multiple components and will be distributed across various systems and networks. In Java, everything is represented as objects; if two Java components want to communicate with each other, there needs be a mechanism to exchange data. One way to achieve this is to define your own protocol and transfer an object. This means that the receiving end must know the protocol used by the sender to re-create the object, which would make it very difficult to talk to third-party components. Hence, there needs to be a generic and efficient protocol to transfer the object between components. Serialization is defined for this purpose, and Java components use this protocol to transfer objects.

Figure 1 shows a high-level view of client/server communication, where an object is transferred from the client to the server through serialization.

A high-level view of serialization in action

Figure 1. A high-level view of serialization in action (click to enlarge)

How to serialize an object

In order to serialize an object, you need to ensure that the class of the object implements the java.io.Serializable interface, as shown in Listing 1.

Listing 1. Implementing Serializable

import java.io.Serializable; class TestSerial implements Serializable { public byte version = 100; public byte count = 0; }

In Listing 1, the only thing you had to do differently from creating a normal class is implement the java.io.Serializable interface. The Serializable interface is a marker interface; it declares no methods at all. It tells the serialization mechanism that the class can be serialized.

Now that you have made the class eligible for serialization, the next step is to actually serialize the object. That is done by calling the writeObject() method of the java.io.ObjectOutputStream class, as shown in Listing 2.

Listing 2. Calling writeObject()

public static void main(String args[]) throws IOException { FileOutputStream fos = new FileOutputStream("temp.out"); ObjectOutputStream oos = new ObjectOutputStream(fos); TestSerial ts = new TestSerial(); oos.writeObject(ts); oos.flush(); oos.close(); }

Listing 2 stores the state of the TestSerial object in a file called temp.out . oos.writeObject(ts); actually kicks off the serialization algorithm, which in turn writes the object to temp.out .

To re-create the object from the persistent file, you would employ the code in Listing 3.

Listing 3. Recreating a serialized object

public static void main(String args[]) throws IOException { FileInputStream fis = new FileInputStream("temp.out"); ObjectInputStream oin = new ObjectInputStream(fis); TestSerial ts = (TestSerial) oin.readObject(); System.out.println("version="+ts.version); }

In Listing 3, the object's restoration occurs with the oin.readObject() method call. This method call reads in the raw bytes that we previously persisted and creates a live object that is an exact replica of the original object graph. Because readObject() can read any serializable object, a cast to the correct type is required.

Executing this code will print version=100 on the standard output.

The serialized format of an object

What does the serialized version of the object look like? Remember, the sample code in the previous section saved the serialized version of the TestSerial object into the file temp.out . Listing 4 shows the contents of temp.out , displayed in hexadecimal. (You need a hexadecimal editor to see the output in hexadecimal format.)

Listing 4. Hexadecimal form of TestSerial

AC ED 00 05 73 72 00 0A 53 65 72 69 61 6C 54 65 73 74 A0 0C 34 00 FE B1 DD F9 02 00 02 42 00 05 63 6F 75 6E 74 42 00 07 76 65 72 73 69 6F 6E 78 70 00 64

If you look again at the actual TestSerial object, you'll see that it has only two byte members, as shown in Listing 5.

Listing 5. TestSerial's byte members

public byte version = 100; public byte count = 0;

The size of a byte variable is one byte, and hence the total size of the object (without the header) is two bytes. But if you look at the size of the serialized object in Listing 4, you'll see 51 bytes. Surprise! Where did the extra bytes come from, and what is their significance? They are introduced by the serialization algorithm, and are required in order to to re-create the object. In the next section, you'll explore this algorithm in detail.

Java's serialization algorithm

By now, you should have a pretty good knowledge of how to serialize an object. But how does the process work under the hood? In general the serialization algorithm does the following:

  • It writes out the metadata of the class associated with an instance.
  • It recursively writes out the description of the superclass until it finds java.lang.object .
  • Once it finishes writing the metadata information, it then starts with the actual data associated with the instance. But this time, it starts from the topmost superclass.
  • It recursively writes the data associated with the instance, starting from the least superclass to the most-derived class.

I've written a different example object for this section that will cover all possible cases. The new sample object to be serialized is shown in Listing 6.

Listing 6. Sample serialized object

class parent implements Serializable { int parentVersion = 10; } class contain implements Serializable{ int containVersion = 11; } public class SerialTest extends parent implements Serializable { int version = 66; contain con = new contain(); public int getVersion() { return version; } public static void main(String args[]) throws IOException { FileOutputStream fos = new FileOutputStream("temp.out"); ObjectOutputStream oos = new ObjectOutputStream(fos); SerialTest st = new SerialTest(); oos.writeObject(st); oos.flush(); oos.close(); } }

This example is a straightforward one. It serializes an object of type SerialTest , which is derived from parent and has a container object, contain . The serialized format of this object is shown in Listing 7.

Listing 7. Serialized form of sample object

AC ED 00 05 73 72 00 0A 53 65 72 69 61 6C 54 65 73 74 05 52 81 5A AC 66 02 F6 02 00 02 49 00 07 76 65 72 73 69 6F 6E 4C 00 03 63 6F 6E 74 00 09 4C 63 6F 6E 74 61 69 6E 3B 78 72 00 06 70 61 72 65 6E 74 0E DB D2 BD 85 EE 63 7A 02 00 01 49 00 0D 70 61 72 65 6E 74 56 65 72 73 69 6F 6E 78 70 00 00 00 0A 00 00 00 42 73 72 00 07 63 6F 6E 74 61 69 6E FC BB E6 0E FB CB 60 C7 02 00 01 49 00 0E 63 6F 6E 74 61 69 6E 56 65 72 73 69 6F 6E 78 70 00 00 00 0B

Figure 2 offers a high-level look at the serialization algorithm for this scenario.

An outline of the serialization algorithm

Figure 2. An outline of the serialization algorithm

Let's go through the serialized format of the object in detail and see what each byte represents. Begin with the serialization protocol information:

  • AC ED : STREAM_MAGIC . Specifies that this is a serialization protocol.
  • 00 05 : STREAM_VERSION . The serialization version.
  • 0x73 : TC_OBJECT . Specifies that this is a new Object .

The first step of the serialization algorithm is to write the description of the class associated with an instance. The example serializes an object of type SerialTest , so the algorithm starts by writing the description of the SerialTest class.

  • 0x72 : TC_CLASSDESC . Specifies that this is a new class.
  • 00 0A : Length of the class name.
  • 53 65 72 69 61 6c 54 65 73 74 : SerialTest , the name of the class.
  • 05 52 81 5A AC 66 02 F6 : SerialVersionUID , the serial version identifier of this class.
  • 0x02 : Various flags. This particular flag says that the object supports serialization.
  • 00 02 : Number of fields in this class.

Next, the algorithm writes the field int version = 66; .

  • 0x49 : Field type code. 49 represents "I", which stands for Int .
  • 00 07 : Length of the field name.
  • 76 65 72 73 69 6F 6E : version , the name of the field.

And then the algorithm writes the next field, contain con = new contain(); . This is an object, so it will write the canonical JVM signature of this field.

  • 0x74 : TC_STRING . Represents a new string.
  • 00 09 : Length of the string.
  • 4C 63 6F 6E 74 61 69 6E 3B : Lcontain; , the canonical JVM signature.
  • 0x78 : TC_ENDBLOCKDATA , the end of the optional block data for an object.

The next step of the algorithm is to write the description of the parent class, which is the immediate superclass of SerialTest .

  • 0x72 : TC_CLASSDESC . Specifies that this is a new class.
  • 00 06 : Length of the class name.
  • 70 61 72 65 6E 74 : SerialTest , the name of the class
  • 0E DB D2 BD 85 EE 63 7A : SerialVersionUID , the serial version identifier of this class.
  • 0x02 : Various flags. This flag notes that the object supports serialization.
  • 00 01 : Number of fields in this class.

Now the algorithm will write the field description for the parent class. parent has one field, int parentVersion = 100; .

  • 0x49 : Field type code. 49 represents "I", which stands for Int .
  • 00 0D : Length of the field name.
  • 70 61 72 65 6E 74 56 65 72 73 69 6F 6E : parentVersion , the name of the field.
  • 0x78 : TC_ENDBLOCKDATA , the end of block data for this object.
  • 0x70 : TC_NULL , which represents the fact that there are no more superclasses because we have reached the top of the class hierarchy.

So far, the serialization algorithm has written the description of the class associated with the instance and all its superclasses. Next, it will write the actual data associated with the instance. It writes the parent class members first:

  • 00 00 00 0A : 10, the value of parentVersion .

Then it moves on to SerialTest .

  • 00 00 00 42 : 66, the value of version .

The next few bytes are interesting. The algorithm needs to write the information about the contain object, shown in Listing 8.

Listing 8. The contain object

contain con = new contain();

Remember, the serialization algorithm hasn't written the class description for the contain class yet. This is the opportunity to write this description.

  • 0x73 : TC_OBJECT , designating a new object.
  • 0x72 : TC_CLASSDESC .
  • 00 07 : Length of the class name.
  • 63 6F 6E 74 61 69 6E : contain , the name of the class.
  • FC BB E6 0E FB CB 60 C7 : SerialVersionUID , the serial version identifier of this class.
  • 0x02 : Various flags. This flag indicates that this class supports serialization.
  • 00 01 : Number of fields in this class.

Next, the algorithm must write the description for contain 's only field, int containVersion = 11; .

  • 0x49 : Field type code. 49 represents "I", which stands for Int .
  • 00 0E : Length of the field name.
  • 63 6F 6E 74 61 69 6E 56 65 72 73 69 6F 6E : containVersion , the name of the field.
  • 0x78 : TC_ENDBLOCKDATA .

Next, the serialization algorithm checks to see if contain has any parent classes. If it did, the algorithm would start writing that class; but in this case there is no superclass for contain , so the algorithm writes TC_NULL .

  • 0x70 : TC_NULL .

Finally, the algorithm writes the actual data associated with contain .

  • 00 00 00 0B : 11, the value of containVersion .

Conclusion

In this tip, you have seen how to serialize an object, and learned how the serialization algorithm works in detail. I hope this article gives you more detail on what happens when you actually serialize an object.

分享到:
评论

相关推荐

    Faster Java Serialization-开源

    "Faster Java Serialization"项目正是为了解决这个问题而诞生的,它通过采用开源的方式,提供了比默认Java序列化更快的实现。这个项目的核心在于运行时生成定制的序列化代码,以优化序列化和反序列化过程。这种做法...

    java Streams and Serialization 详解

    ### Java Streams 和 Serialization 详解 在Java编程语言中,数据的读取与写入操作是通过流(Streams)实现的。此外,为了保存对象的状态,Java提供了序列化(Serialization)机制。本文将深入探讨Java中的流操作...

    Experiences Implementing Efficient Java Thread Serialization

    ### 实现高效Java线程序列化、移动性和持久性的经验 #### 概述 随着分布式计算的发展,移动性与持久性成为了重要的组成部分。这些特性在负载均衡、容错以及应用程序的动态重构等领域有着广泛的应用场景。Java作为...

    java7帮助文档

    Java Virtual Machine Support for Non-Java Languages: Java SE 7 introduces a new JVM instruction that simplifies the implementation of dynamically typed programming languages on the JVM. Garbage-First...

    Java序列化(Serialization) 机制

    Java序列化机制是Java平台提供的一种标准方法,用于将对象的状态转换为字节流,以便存储在磁盘上,或者在网络中进行传输。这使得Java对象可以在不同的Java虚拟机(JVM)之间交换,这对于分布式应用程序,如远程方法...

    kotlinx-serialization-compiler-plugin.jar

    kotlinx-serialization-compiler-plugin.jar

    System.Runtime.Serialization.DLL.zip

    《深入理解System.Runtime.Serialization.DLL及其在.NET框架中的作用》 在.NET框架中,`System.Runtime.Serialization`命名空间是处理序列化和反序列化的核心组件,而`System.Runtime.Serialization.dll`则是这个...

    Java_serialization_doc.rar_Serializable _doc_序列化

    Java中对象的序列化(serialization)允许把采用Serializable接口的任何对象转换成字节流序列;同时它也允许把字节流序列转换回对象本身。其机制不依赖于操作系统,也就是说,你可以通过网络传递该对象,并在网络...

    sirenix.serialization.dll

    sirenix.serialization.dll

    Java 9 for Programmers (Deitel Developer Series) 完整高清azw3版

    streams, functional interfaces, object serialization, concurrency, generics, generic collections, database with JDBC™ and JPA, and compelling new Java 9 features, such as the Java Platform Module ...

    深入浅析Java Object Serialization与 Hadoop 序列化

    深入浅析Java Object Serialization与 Hadoop 序列化 序列化是指将结构化对象转化为字节流以便在网络上传输或者写到磁盘永久存储的过程。Java 中的序列化是通过实现 Serializable 接口来实现的,而 Hadoop 序列化则...

    Java2核心技术卷I+卷2:基础知识(第8版) 代码

    The Java “White Paper” Buzzwords 2 Java Applets and the Internet 7 A Short History of Java 9 Common Misconceptions about Java 11 Chapter 2: The Java Programming Environment 15 Installing the...

    PyPI 官网下载 | oslo.serialization-2.2.0.tar.gz

    5. **多语言支持**:尽管主要是为Python设计,`oslo.serialization`的JSON功能也能很好地与其他语言如JavaScript、Java等交互。 **使用示例** 以下是一些简单的使用`oslo.serialization`的例子: ```python ...

    kotlinx.serialization,Kotlin跨平台/多格式序列化.zip

    【Kotlinx.Serialization详解】 Kotlinx.Serialization是一个强大的开源库,专门为Kotlin编程语言提供了跨平台的序列化解决方案。这个库允许开发者将数据对象转换成字节流或JSON等不同格式,反之亦然,这对于数据...

    java_double_serialization:Java double值与字符串,longs,int之间的转换

    Java的序列化通过整数拆分加倍 通常需要存储以非二进制格式计算得出的浮点数而又不丢失信息,例如存储在(JSON)文本文件中。 Java 提供了一种使用获取double 值的精确表示的。 Java还可以使用将生成的十六进制字符...

    System.Runtime.Serialization.dll

    放在bin文件夹 解决 System.Runtime.Serialization.Json无法引用的问题

    《Programming in Java Second Edition》高清完整英文PDF版

    The second edition of Programming in Java confirms to Java Standard Edition 7, the latest release since Oracle took over Sun Microsystems. It is significant in the sense that the last update was six ...

    A C++11 library for serialization

    cereal is a header-only ... Write serialization functions for your custom types or use the built in support for the standard library cereal provides Use the serialization archives to load and save data

Global site tag (gtag.js) - Google Analytics