Java comes with its own serialization mechanism, called Java Object Serialization (often
referred to simply as “Java Serialization”), that is tightly integrated with the language,
so it’s natural to ask why this wasn’t used in Hadoop. Here’s what Doug Cutting said
in response to that question:
Why didn’t I use Serialization when we first started Hadoop? Because it looked
big and hairy and I thought we needed something lean and mean, where we had
precise control over exactly how objects are written and read, since that is central
to Hadoop. With Serialization you can get some control, but you have to fight for
it.
The logic for not using RMI was similar. Effective, high-performance inter-process
communications are critical to Hadoop. I felt like we’d need to precisely control
how things like connections, timeouts and buffers are handled, and RMI gives you
little control over those.
The problem is that Java Serialization doesn’t meet the criteria for a serialization format
listed earlier: compact, fast, extensible, and interoperable.
Java Serialization is not compact: it writes the classname of each object being written
to the stream—this is true of classes that implement java.io.Serializable or
java.io.Externalizable. Subsequent instances of the same class write a reference handle
to the first occurrence, which occupies only 5 bytes. However, reference handles
don’t work well with random access, since the referent class may occur at any point in
the preceding stream—that is, there is state stored in the stream. Even worse, reference
handles play havoc with sorting records in a serialized stream, since the first record of
a particular class is distinguished and must be treated as a special case.
All these problems are avoided by not writing the classname to the stream at all, which
is the approach that Writable takes. This makes the assumption that the client knows
the expected type. The result is that the format is considerably more compact than Java
Serialization, and random access and sorting work as expected since each record is
independent of the others (so there is no stream state).
Java Serialization is a general-purpose mechanism for serializing graphs of objects, so
it necessarily has some overhead for serialization and deserialization operations. What’s
more, the deserialization procedure creates a new instance for each object deserialized
from the stream. Writable objects, on the other hand, can be (and often are) reused.
For example, for a MapReduce job, which at its core serializes and deserializes billions
of records of just a handful of different types, the savings gained by not having to allocate
new objects are significant.
In terms of extensibility, Java Serialization has some support for evolving a type, but it
is brittle and hard to use effectively (Writables have no support: the programmer has
to manage them himself).
In principle, other languages could interpret the Java Serialization stream protocol (defined
by the Java Object Serialization Specification), but in practice there are no widely
used implementations in other languages, so it is a Java-only solution. The situation is
the same for Writables.
分享到:
相关推荐
深入浅析Java Object Serialization与 Hadoop 序列化 序列化是指将结构化对象转化为字节流以便在网络上传输或者写到磁盘永久存储的过程。Java 中的序列化是通过实现 Serializable 接口来实现的,而 Hadoop 序列化则...
### 实现高效Java线程序列化、移动性和持久性的经验 #### 概述 随着分布式计算的发展,移动性与持久性成为了重要的组成部分。这些特性在负载均衡、容错以及应用程序的动态重构等领域有着广泛的应用场景。Java作为...
这时,FST(Fast Serialization Toolkit)作为一个高效且JDK兼容的序列化库,提供了更快的速度和更小的内存占用,成为了Java开发者的一个优秀选择。 FST的主要特点包括: 1. **高性能**:FST通过优化的序列化算法...
### Java Streams 和 Serialization 详解 在Java编程语言中,数据的读取与写入操作是通过流(Streams)实现的。此外,为了保存对象的状态,Java提供了序列化(Serialization)机制。本文将深入探讨Java中的流操作...
在编程领域,Object与XML之间的转换是常见的数据处理任务,特别是在数据交换、持久化存储或者网络通信中。Object转XML和XML转Object的过程涉及到对象序列化和反序列化的概念。 对象序列化是将一个对象的状态转化...
Java中对象的序列化(serialization)允许把采用Serializable接口的任何对象转换成字节流序列;同时它也允许把字节流序列转换回对象本身。其机制不依赖于操作系统,也就是说,你可以通过网络传递该对象,并在网络...
Java2 ?Java5…Java6 ,传说?寂寞? 1998年12月,SUN公司发布了JDK1.2,开始使用“Java 2” 这一名称,目 前我们已经很少使用JDK1.1版本,所以我们所说的Java都是指Java2 之 后的。J2SDK当然就是Java 2 Software ...
kotlinx-serialization-compiler-plugin.jar
JSONtext-Object-Serialization-main.zip这个压缩包可能包含一个LabVIEW项目,该项目专注于JSON对象序列化和反序列化的实现。序列化是指将LabVIEW的数据结构(如簇、数组、字符串等)转换为JSON格式的字符串,而反...
Java序列化机制是Java平台提供的一种标准方法,用于将对象的状态转换为字节流,以便存储在磁盘上,或者在网络中进行传输。这使得Java对象可以在不同的Java虚拟机(JVM)之间交换,这对于分布式应用程序,如远程方法...
Object Serialization Stream Protocol.pdf OFCMS 1.1.4后台存在Freemarker模板命令注入漏洞.docx OWASP Top 10 20134e2d65877248-V1.3.pdf OWASP Top 10 2017 10项最严重的 Web 应用程序安全风险.pdf OWASP代码审计...
Boost.Serialization库提供了一个`save_object_ptr`函数,用于保存指向已序列化对象的指针。然而,`std::unique_ptr`不支持拷贝构造函数和赋值操作符,所以我们需要使用`boost::serialization::make_nvp`创建一个...
streams, functional interfaces, object serialization, concurrency, generics, generic collections, database with JDBC™ and JPA, and compelling new Java 9 features, such as the Java Platform Module ...
《深入理解System.Runtime.Serialization.DLL及其在.NET框架中的作用》 在.NET框架中,`System.Runtime.Serialization`命名空间是处理序列化和反序列化的核心组件,而`System.Runtime.Serialization.dll`则是这个...
"Faster Java Serialization"项目正是为了解决这个问题而诞生的,它通过采用开源的方式,提供了比默认Java序列化更快的实现。这个项目的核心在于运行时生成定制的序列化代码,以优化序列化和反序列化过程。这种做法...
RandomAccessFile classes along with streams (including object serialization and externalization) and writers/readers. Chapters 6 through 11 focus on NIO. You explore buffers, channels, selectors, ...
sirenix.serialization.dll
标题中的"A neat way to use MFCs built-in serialization to store COM objects" 提示我们,这个压缩包中的内容可能涉及到了Microsoft Foundation Classes (MFC) 的序列化机制以及如何利用它来存储Component Object...
`oslo.serialization`是OpenStack项目的一部分,它提供了一套灵活的数据序列化工具,支持JSON(JavaScript Object Notation)和pickle(Python的内置序列化格式)格式。这个库主要服务于那些需要在不同的系统或进程...