Wow, LZ4 is fast!
I’ve been doing some experiments with LZ4 recently and I must admit that I am truly impressed. For those not familiar with LZ4, it is a compression format from the LZ77 family. Compared to other similar algorithms (such as Google’s Snappy), LZ4’s file format does not allow for very high compression ratios since:
- you cannot reference sequences which are more than 64kb backwards in the stream,
- it encodes lengths with an algorithm that requires 1 + floor(n / 255) bytes to store an integer n instead of the 1 + floor(log(n) / log(2^7)) bytes that variable-length encoding would require.
This might sound like a lot of lost space, but fortunately things are not that bad: there are generally a lot of opportunities to find repeated sequences in a 64kb block, and unless you are working with trivial inputs, you very rarely need to encode lengths which are greater than 15. In case you still doubt LZ4 ability to achieve high compression ratios, the original implementation includes a high compression algorithm that can easily achieve a 40% compression ratio on common ASCII text.
But this file format also allows you to write fast compressors and uncompressors, and this is really what LZ4 excels at: compression and uncompression speed. To measure how faster LZ4 is compared to other famous compression algorithms, I wrote three Java implementations of LZ4:
- a JNI binding to the original C implementation (including the high compression algorithm),
- a pure Java port, using the standard API,
- a pure Java port that uses the sun.misc.Unsafe API to speed up (un)compression.
Then I modified Ning’s JVM compressor benchmark (kudos to Ning for sharing it!) to add my compressors and ran the Calgary compression benchmark.
The results are very impressive:
- the JNI default compressor is the fastest one in all cases but one, and the JNI uncompressor is always the fastest one,
- even when compressed with the high compression algorithm, data is still very fast to uncompress, which is great for read-only data,
- the unsafe Java compressor/uncompressor is by far the fastest pure Java compressor/uncompressor,
- the safe Java compressor/uncompressor has comparable performance to some compressors/uncompressors that use the sun.misc.Unsafe API (such as LZF).
Compression
Uncompression
If you are curious about the compressors whose names start with “LZ4 chunks”, these are compressors that are implemented with Java streams API and compress every 64kb block of the input data separately.
For the full Japex reports, see people.apache.org/~jpountz/lz4.
相关推荐
4. 安装 Maven 3,用于构建 Snappy 压缩算法的依赖项。 三、 Snappy 安装过程及验证 1. 下载 Snappy 1.1.1 版本,下载地址:http://code.google.com/p/snappy/downloads/list。 2. 编译并安装 Snappy 压缩算法到...
Snappy(旧称:Zippy)是Google基于LZ77的思路用C++语言编写的快速数据压缩与解压程序库,并在2011年开源。其目标并非最大压缩率或与其他压缩程序的兼容性,而是非常高的速度和合理的压缩率。使用一个运行在64位模式...
Snappy是由Google开发的一种高效的压缩和解压缩算法,主要用于提高数据存储和传输的效率。它在设计时特别强调了速度,而不是压缩率,因此在处理大量数据时,Snappy能够提供显著的速度优势,尤其是在大数据处理和实时...
本篇文章将详细探讨MapReduce如何使用Gzip、Snappy和Lzo这三种压缩算法来写入和读取文件。 1. Gzip压缩 Gzip是一种广泛使用的压缩算法,其压缩率较高,但压缩和解压缩速度相对较慢。在MapReduce中,通过设置`...
snappy-java是一个 Java 语言对 Snappy 压缩算法的实现,snappy-java-1.1.10.5.jar提供了对数据进行 Snappy 压缩和解压缩的功能
Google Snappy是一款高效的数据压缩算法,主要设计用于提高大数据处理中的速度,而非压缩率。它在Google内部广泛应用于Hadoop、Bigtable等大数据系统中,因为其快速的压缩和解压缩速度,尤其适合实时数据流处理。...
GZIP、LZO、Zippy/Snappy是三种广泛使用的数据压缩算法,每种算法都有其特定的应用场景和性能特点,合理选择压缩算法对于提升系统性能和节约资源至关重要。 首先,GZIP是一种广泛使用的文件压缩工具,它基于DEFLATE...
首先,让我们深入了解一下Snappy压缩算法。Snappy是由Google开发的,它的设计目标是追求速度而非最高压缩比。在处理大量数据时,快速的压缩和解压缩速度可以显著减少数据传输时间,提高整体处理效率。Snappy在Hadoop...
Snappy作为其中的一种压缩算法,可以通过配置启用。 以下是配置Snappy压缩的步骤: 1. **修改HBase配置**: 在`hbase-site.xml`配置文件中,你需要添加或修改以下配置项: ```xml <name>hbase.hregion....
Snappy是一种高效的压缩与解压缩算法,它在压缩比与速度之间取得了良好的平衡,尤其适用于大数据场景下的实时数据压缩。Snappy相较于其他常见的压缩算法(如bzip2、gzip、lzo等)具有以下优势: 1. **速度快**:...
标题中的“支持snappy压缩的hadoop”指的是Hadoop,一个开源的大数据处理框架,集成了对Snappy压缩算法的支持。Snappy是由谷歌开发的一种高效、快速的数据压缩和解压缩库,尤其适合大数据环境下的I/O操作。在Hadoop...
标题中的"source_lzo_lz_snappy1.rar"暗示了这个压缩包可能包含了与数据压缩相关的源代码或库,特别是LZO、LZ和Snappy这三种不同的压缩算法。LZO(Lempel-Ziv-Oberhumer)是一种快速但压缩率较低的压缩算法,适用于...
Snappy.Sharp用C#实现Google的Snappy压缩算法。 这项工作仍在进行中,尤其是流媒体方面。 目前的状态是,我认为压缩和扩展算法在Snappy.Sharp中起作用。Google的Snappy压缩算法在C#中的实现。 这项工作仍在进行中...
snappy压缩技术的源码,是google云存储的基础
在Hadoop 2.7.2版本中,引入了对Snappy压缩算法的支持,这是一种高效的数据压缩库,尤其适合大数据环境。这个压缩支持包是针对Hadoop 2.7.2版本的,用于增强其在数据压缩方面的能力。 Snappy是由Google开发的,旨在...
【标题】"hadoop3.x带snappy(可用于windows本地开发)"所涉及的知识点主要集中在Hadoop 3.0版本以及Snappy压缩算法在Windows环境下的应用。Hadoop是一个开源的大数据处理框架,由Apache软件基金会开发,它使得在...
Snappy是一种高效的压缩和解压缩算法,常被用于提高Hadoop生态系统中的数据传输速度和存储效率。这个“支持snappy压缩的hadoop压缩包”是专为优化Hadoop性能而设计的,它包含了配置和可能的库文件,使得用户可以直接...
`io.compression.codec.snappy.class`设置为Snappy压缩编码器的类名,`mapreduce.map.output.compress.codec`和`mapreduce.output.fileoutputformat.compress.codec`设置为Snappy压缩算法。 此外,Hadoop 2.7.2还...