`
bzhang
  • 浏览: 255003 次
  • 性别: Icon_minigender_1
  • 来自: 大连
社区版块
存档分类
最新评论

A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA

阅读更多

How the test files were selected

I was especially interested how well LZMA compression would fit in

  • binary package management of GNU/*/Linux distributions
  • distributing source code of free software

In both uses the files are compressed on one computer and decompressed many times by users around the world. In practice the most important factors are:

  • compressed size (faster to download; more packages fit into one CD or DVD)
  • time required in decompression (fast installation is nice)
  • memory requirements for decompression (should the user want to use the file on e.g. an old i486 with 8 MB RAM)
  • common format that everyone knows how to uncompress/install

Less important:

  • time wasted for compressing in the package build process; compiling software usually takes several minutes or even hours so spending one or two minutes to compress the package tightly increases the build time only little.
  • memory requirements for the compressing process; few people build packages on i486 or i586 class machines having 16 to 64 megs of RAM. However, no one wants to use a tool that needs hundreds of megabytes or even gigabytes of RAM to achieve good results.

Despite the many common factors, the contents of binary packages and source tarballs are quite different. Binary packages primarily contain executables and libraries while source tarballs contain mostly ASCII text of some programming language. Naturally both contain data files used by the program and (hopefully) some documentation.

Test conditions

Tests were run on a laptop:

  • AMD mobile Athlon XP2400+
  • 512 MB RAM
  • Linux 2.6.12-rc4 (preempt, 4k stacks, regparm)
  • gzip 1.3.3, bzip2 1.0.3, LZMA SDK 4.17

bzip2 has two compression modes, one for normal use and another designed for small memory footprint (which can be invoked with 'bzip2 --small'). Only the normal mode was tested because it's faster.

Times are from the output of the command 'time' (line 'user') and rounded. Because of this, the compression and decompression time and speed tables should be taken as suggestive and not as the absolute truth. In practice, the bigger test files should be more reliable in terms of speed comparison.

When reading the tables, it is important to keep in mind which settings are the default in each program:

  • gzip -6 (best speed/filesize ratio)
  • bzip -9 (best compression ratio)
  • lzmash -7 (excellent compression ratio and reasonable memory requirements)
  • lzmash -e (the extreme mode) is only for reference in case someone wants to see how it affects the compression ratio.

The tables of the test results

Note: The first column with numbers 1..9 indicates the compression setting passed to gzip, bzip2 and lzmash (e.g. "gzip -9").

Tarball made from a full installation of OpenOffice.org 1.1.4 for Linux

Uncompressed size: 212664320 bytes (203 MB)

Compressed file size in bytes
	gzip		bzip2		lzmash		lzmash -e
1	86322815	76147880	67456213	-
2	84858575	74320824	62085798	-
3	83561997	73467586	59547691	59278372
4	81312776	73044026	58245872	57964166
5	79798262	72762041	56694215	56411631
6	79179298	72540199	56182079	55859514
7	78995264	72512833	55535273	55269226
8	78816280	72314472	54678948	54405078
9	78768334	72223858	54068819	53769958

Compressed size / Uncompressed size * 100%
	gzip		bzip2		lzmash		lzmash -e
1	40,6%		35,8%		31,7%		-
2	39,9%		34,9%		29,2%		-
3	39,3%		34,5%		28,0%		27,9%
4	38,2%		34,3%		27,4%		27,3%
5	37,5%		34,2%		26,7%		26,5%
6	37,2%		34,1%		26,4%		26,3%
7	37,1%		34,1%		26,1%		26,0%
8	37,1%		34,0%		25,7%		25,6%
9	37,0%		34,0%		25,4%		25,3%

Compression time
	gzip		bzip2		lzmash		lzmash -e
1	 11.5s		1m 26s		 0m 58s		-
2	 12.0s		1m 40s		 2m  7s		-
3	 13.7s		1m 54s		 4m 58s		 7m 37s
4	 15.1s		2m  5s		 5m 26s		 8m  2s
5	 18.4s		2m 11s		 6m 47s		11m 18s
6	 24.5s		2m 18s		 7m 30s		12m  4s
7	 29.4s		2m 25s		 8m 24s		12m 59s
8	 45.5s		2m 32s		10m 59s		20m 17s
9	 66.9s		2m 37s		12m 20s		21m 56s

Decompression time
	gzip		bzip2		lzmash		lzmash -e
1	3.3s		16.5s		11.3s		-
2	3.3s		24.2s		10.5s		-
3	3.3s		29.2s		10.5s		10.4s
4	3.3s		32.1s		10.4s		10.3s
5	3.2s		34.2s		10.2s		10.2s
6	3.2s		35.4s		10.2s		10.1s
7	3.2s		36.5s		10.1s		10.0s
8	3.2s		37.5s		10.0s		 9.9s
9	3.1s		38.2s		10.0s		 9.9s

Compression speed, MB/s of uncompressed data (1 MB = 1024 * 1024 bytes)
	gzip		bzip2		lzmash		lzmash -e
1	18		2.4		3.5		-
2	17		2.0		1.6		-
3	15		1.8		0.68		0.44
4	13		1.6		0.62		0.42
5	11		1.5		0.50		0.30
6	 8.3		1.5		0.45		0.28
7	 6.9		1.4		0.40		0.26
8	 4.5		1.3		0.31		0.17
9	 3.0		1.3		0.27		0.15

Decompression speed, MB/s of uncompressed data (1 MB = 1024 * 1024 bytes)
	gzip		bzip2		lzmash		lzmash -e
1	61		12		18		-
2	61		 8.4		19		-
3	61		 6.9		19		20
4	61		 6.3		20		20
5	63		 5.9		20		20
6	63		 5.7		20		20
7	63		 5.6		20		20
8	63		 5.4		20		20
9	65		 5.3		20		20

The Linux kernel 2.6.11.0 source tarball

Uncompressed size: 208250880 bytes (199 MB)

Compressed file size in bytes
	gzip		bzip2		lzmash		lzmash -e
1	57860603	43873922	43933138	-
2	55274813	41108704	38871392	-
3	53416918	39791569	34863499	34823465
4	49695438	39040694	33545762	33513509
5	47775348	38395197	32481024	32445716
6	47004031	37975094	31686173	31661947
7	46797152	37676593	30881464	30841602
8	46578138	37365408	30295730	30261027
9	46578138	37075679	29809336	29780803

Compressed size / Uncompressed size * 100%
	gzip		bzip2
1	27,8%		21,1%		21,1%		-
2	26,5%		19,7%		18,7%		-
3	25,7%		19,1%		16,7%		16,7%
4	23,9%		18,7%		16,1%		16,1%
5	22,9%		18,4%		15,6%		15,6%
6	22,6%		18,2%		15,2%		15,2%
7	22,5%		18,1%		14,8%		14,8%
8	22,4%		17,9%		14,5%		14,5%
9	22,4%		17,8%		14,3%		14,3%

Compression time
	gzip		bzip2		lzmash		lzmash -e
1	 8.3s		1m  9s		 0m 45s		-
2	 8.7s		1m 22s		 1m 45s		-
3	 9.8s		1m 34s		 5m 10s		 8m 43s
4	11.1s		1m 45s		 5m 43s		 9m 41s
5	13.8s		1m 57s		 7m 39s		14m 38s
6	17.8s		2m  2s		 8m 23s		15m 32s
7	20.7s		2m 11s		 9m 11s		16m 23s
8	29.7s		2m 21s		11m 34s		24m 47s
9	40.9s		2m 26s		12m 31s		25m 53s

Decompression time
	gzip		bzip2		lzmash		lzmash -e
1	2.8s		12.8s		7.7s		-
2	2.7s		19.4s		6.9s		-
3	2.6s		23.8s		6.4s		6.6s
4	2.5s		26.4s		6.3s		6.3s
5	2.5s		28.3s		6.3s		6.3s
6	2.4s		29.6s		6.2s		6.3s
7	2.4s		30.6s		6.2s		6.2s
8	2.4s		31.3s		6.1s		6.1s
9	2.4s		32.1s		6.1s		6.1s

Compression speed, MB/s of uncompressed data (1 MB = 1024 * 1024 bytes)
	gzip		bzip2		lzmash		lzmash -e
1	24		2.9		4.4		-
2	23		2.4		1.9		-
3	20		2.1		0.64		0.38
4	18		1.9		0.58		0.34
5	14		1.7		0.43		0.23
6	11		1.6		0.39		0.21
7	 9.6		1.5		0.36		0.20
8	 6.7		1.4		0.29		0.13
9	 4.9		1.4		0.26		0.13

Decompression speed, MB/s of uncompressed data (1 MB = 1024 * 1024 bytes)
	gzip		bzip2		lzmash		lzmash -e
1	71		16		26
2	74		10		29
3	76		 8.3		31		30
4	79		 7.5		32		32
5	79		 7.0		32		32
6	83		 6.7		32		32
7	83		 6.5		32		32
8	83		 6.3		33		33
9	83		 6.2		33		33

In this test bzip2 is a tough adversary to lzmash in fast modes. "lzmash -e" makes a few kB smaller files with the expense of a lot longer compression time.


XMMS 1.2.10 binary package

XMMS 1.2.10 binary package (xmms-1.2.10-i486-2.tgz) from Slackware 10.1. The file was first gunzipped, resulting uncompressed size of 5498880 bytes (5.2 MB).

Compressed file size in bytes
	gzip		bzip2		lzmash		lzmash -e
1	2160102		1803573		1431699		-
2	2112332		1611408		1140030		-
3	2072044		1539083		1034903		1038615
4	2031519		1487237		1004176		1007692
5	1992713		1464332		 987189		 988758
6	1979068		1433617		 983305		 983198
7	1973404		1431276		 982125		 983240
8	1972424		1414142		 980836		 983582
9	1970643		1385112		 980836		 983582

Compressed size / Uncompressed size * 100%
	gzip		bzip2		lzmash		lzmash -e
1	39,3%		32,8%		26,0%		-
2	38,4%		29,3%		20,7%		-
3	37,7%		28,0%		18,8%		18,9%
4	36,9%		27,0%		18,3%		18,3%
5	36,2%		26,6%		18,0%		18,0%
6	36,0%		26,1%		17,9%		17,9%
7	35,9%		26,0%		17,9%		17,9%
8	35,9%		25,7%		17,8%		17,9%
9	35,8%		25,2%		17,8%		17,9%

Compression time
	gzip		bzip2		lzmash		lzmash -e
1	0.3s		2.4s		 1.4s		-
2	0.3s		2.9s		 2.7s		-
3	0.4s		3.2s		 6.2s		 8.9s
4	0.4s		3.3s		 6.6s		 9.3s
5	0.5s		4.6s		 8.2s		13.3s
6	0.7s		5.6s		 8.5s		13.7s
7	0.8s		4.7s		 8.6s		13.6s
8	1.1s		4.9s		10.5s		21.5s
9	1.8s		5.1s		10.5s		21.5s

Decompression time
	gzip		bzip2		lzmash		lzmash -e
1	0.1s		0.4s		0.3s		-
2	0.1s		0.6s		0.2s		-
3	0.1s		0.7s		0.2s		0.2s
4	0.1s		0.8s		0.2s		0.2s
5	0.1s		0.9s		0.2s		0.2s
6	0.1s		0.9s		0.2s		0.2s
7	0.1s		0.9s		0.2s		0.2s
8	0.1s		1.0s		0.2s		0.2s
9	0.1s		1.0s		0.2s		0.2s

For some reason, "bzip2 -6" took more time than even "bzip -9". The result didn't change when the test was repeated. The extreme mode of lzmash creates a few bytes bigger files; seems that using "lzmash -e" makes compression both slower and less efficient with smaller files. Speed tables are omitted because the smaller test file makes measuring the elapsed time with 'time' command too inaccurate.


XMMS 1.2.10 source tarball

Uncompressed size: 15964160 bytes (15.2 MB)

Compressed file size in bytes
	gzip		bzip2		lzmash		lzmash -e
1	4705710		3702465		3390291		-
2	4560441		3172615		2117511		-
3	4460478		2914692		1921894		1929077
4	4213705		2748562		1803104		1808532
5	4095300		2670185		1721301		1723689
6	4060060		2591439		1642013		1643645
7	4046707		2500735		1540827		1541735
8	4035433		2464688		1533283		1531514
9	4034855		2418265		1533283		1531514

Compressed size / Uncompressed size * 100%
	gzip		bzip2		lzmash		lzmash -e
1	29,5%		23,2%		21,2%		-
2	28,6%		19,9%		13,3%		-
3	27,9%		18,3%		12,0%		12,1%
4	26,4%		17,2%		11,3%		11,3%
5	25,7%		16,7%		10,8%		10,8%
6	25,4%		16,2%		10,3%		10,3%
7	25,3%		15,7%		 9,7%		 9,7%
8	25,3%		15,4%		 9,6%		 9,6%
9	25,3%		15,1%		 9,6%		 9,6%

Compression time
	gzip		bzip2		lzmash		lzmash -e
1	0.7s		 6.1s		 3.5s		-
2	0.7s		 7.3s		 6.0s		-
3	0.8s		 8.5s		19.0s		 30.8s
4	0.9s		 9.9s		19.9s		 31.2s
5	1.1s		11.2s		28.9s		1m  1s
6	1.4s		11.0s		30.1s		1m  2s
7	1.7s		12.5s		30.9s		1m  4s
8	2.5s		15.9s		41.7s		1m 56s
9	2.9s		17.5s		41.7s		1m 56s

Decompression time
	gzip		bzip2		lzmash		lzmash -e
1	0.2s		1.0s		0.6s		-
2	0.2s		1.5s		0.4s		-
3	0.2s		1.9s		0.4s		0.4s
4	0.2s		2.1s		0.4s		0.4s
5	0.2s		2.3s		0.4s		0.4s
6	0.2s		2.5s		0.4s		0.4s
7	0.2s		2.6s		0.4s		0.4s
8	0.2s		2.7s		0.4s		0.4s
9	0.2s		2.8s		0.4s		0.4s

For some reason, in compression "bzip2 -6" was a little faster than "bzip -5" but "bzip -6" still created smaller file. Speed tables are omitted because the smaller test file makes measuring the elapsed time with 'time' command too inaccurate.


Memory requirements

The memory requirements depend only on the used compression mode (-1 .. -9). bzip2 has also a mode that uses less memory but is slower. This small memory mode hasn't been tested.

RAM usage on compression
	gzip		bzip2		lzmash		lzmash -e
1	<1 MB		2 MB		  2 MB		 -
2	<1 MB		2 MB		 12 MB		 -
3	<1 MB		3 MB		 12 MB		 12 MB
4	<1 MB		4 MB		 16 MB		 16 MB
5	<1 MB		5 MB		 26 MB		 26 MB
6	<1 MB		5 MB		 45 MB		 45 MB
7	<1 MB		6 MB		 83 MB		 83 MB
8	<1 MB		7 MB		159 MB		159 MB
9	<1 MB		7 MB		311 MB		311 MB

RAM usage on decompression
	gzip		bzip2		lzmash		lzmash -e
1	<1 MB		1 MB		 1 MB		 -
2	<1 MB		2 MB		 2 MB		 -
3	<1 MB		2 MB		 1 MB		 1 MB
4	<1 MB		2 MB		 2 MB		 2 MB
5	<1 MB		3 MB		 3 MB		 3 MB
6	<1 MB		3 MB		 5 MB		 5 MB
7	<1 MB		3 MB		 9 MB		 9 MB
8	<1 MB		4 MB		17 MB		17 MB
9	<1 MB		4 MB		33 MB		33 MB

Conclusions

Compression

When there's need for a very fast compression, gzip is the clear winner. It has also very small memory footprint, making it ideal for systems with limited memory.

bzip2 creates about 15% smaller files than gzip. bzip2 compresses somewhat slower than gzip, but seems that it hasn't prevented bzip2 from getting popular. Nowadays most source code is available as both gzip and bzip2 compressed tar archives.

"lzmash -3" and "lzmash -4" seem to be almost as fast (or slow); same can be said for "lzmash -5", "lzmash -6" and "lzmash -7". However the memory requirements increase with every option meaning that "lzmash -3", "lzmash -5" and "lzmash -6" are usually useful only if you (or the recipient) do not have enough memory for "lzmash -4" or "lzmash -7".

"lzmash -8" and "lzmash -9" require lots of memory and are practical only on newer computers; the files compressed with them are probably a pain to decompress on systems with less than 32 MB or 64 MB of memory.

The extreme mode ("lzmash -e") roughly doubles the compression time, but especially with small files can lead to even worse compression ratio than normal the mode. The extereme mode might be worth trying if you want make as small files as possible, but in that case forgetting lzmash wrapper script and playing with command line options of "lzma" directly can lead to better results.

Decompression

In terms of speed, gzip is the winner again. lzma comes right behind it two to three times slower than gzip. bzip2 is a lot slower taking usually two to six times more time than lzma, that is, four to twelve times more than gzip. One interesting thing is that gzip and lzma decompress the faster the smaller the compressed size is, while bzip2 gets slower when the compression ratio gets better.

The memory usage of lzma stays competitive with bzip2 when files have been compressed with "lzmash -6" or with a smaller option. The files compressed with the default "lzmash -7" can still be decompressed, even on machines with only 16 MB of RAM, but sometimes you don't have even that much memory available. If you compress with "lzmash -8" or "lzmash -9", you should think if the users need to be able to decompress your files also on "ancient" computers.

So what is the best?

Of course, it depends on the intended application. gzip is very fast and has small memory footprint. According to this benchmark, neither bzip2 nor lzma can compete with gzip in terms of speed or memory usage. bzip2 has notably better compression ratio than gzip, which has to be the reason for the popularity of bzip2; it is slower than gzip especially in decompression and uses more memory. However the memory requirements of bzip2 should be nowadays no problem even on older hardware.

Both gzip and bzip2 are bundled with practically all GNU/*/Linux distributions and *BSDs. Because everybody has the tools to handle gzip and bzip2 compressed files, they are by far the most commonly used formats to distribute e.g. source code of free software. However, the situation might change because better free (as in freedom) alternatives have become available.

LZMA clearly has potential to become the third commonly used general purporse compression format on *NIX systems. It mainly competes with bzip2 by offering significantly better compression ratio while still keeping decompressing speed relatively close to that of gzip. Its excellence has been already seen in Tukaani Linux package management system, and in software installers such as Nullsoft Scriptable Install System (NSIS), Inno setup and installers of MS-Windows versions of Mozilla products, including Firefox and Thunderbird.

分享到:
评论

相关推荐

    shapenetcore_partanno_segmentation_benchmark_v0.zip

    pointnet的数据集 shapenetcore_partanno_segmentation_benchmark_v0.zip; 有币的走投币,没币的链接:https://pan.baidu.com/s/1E_W1_nBTuVsQzk1HDJYZig 提取码:g6eo

    benchmark:[GPU.js]的基准化工具(https

    Benchmark是用于GPU.js的简单基准测试工具。 该工具在JavaScript和CLI中均可使用。 该工具运行三个基准测试: 目录 与React Native一起使用 安装 注意:软件包gpu.js需要单独安装。 基准可在npm上以@gpujs/...

    AS SSD Benchmark 1.6.4237.30508

    AS SSD Benchmark是一款专业的固态硬盘性能测试工具,由德国InnoDisk公司开发,主要用于评估SSD(固态硬盘)和HDD(机械硬盘)的读写速度、IOPS(每秒输入/输出操作次数)等关键性能指标。该工具1.6.4237.30508版本...

    CIS benchmarks.rar

    CIS_Aliyun_Linux_2_Benchmark_v1.0.0.pdf CIS_Amazon_Linux_2_Benchmark_v1.0.0.pdf CIS_Amazon_Linux_2_STIG_Benchmark_v1.0.0.pdf CIS_Amazon_Web_Services_Foundations_Benchmark_v1.2.0.pdf CIS_Amazon_Web_...

    benchmark.js-addons:简化Benchmark.js测试的工具

    Benchmark.js插件 一些帮助文件可简化测试的编写。 用法 运行测试,演示,构建等主要可从Makefile中获得。 记者 ./src/benchmark-reporter.js 一个非常简单的报告器,将信息记录到history对象并打印信息到console ...

    CIS_CentOS_Linux_7_Benchmark_v3.0.0.pdf

    CIS_CentOS_Linux_7_Benchmark_v3.0.0.

    固态硬盘基准测试(AS SSD Benchmark)2.0.6821.41776汉化版

    固态硬盘基准测试(AS SSD Benchmark)2.0.6821.41776汉化版

    benchmarksql-4.1.1.zip

    《BenchmarkSQL 4.1.1:数据库性能测试利器》 BenchmarkSQL,作为一个开源的数据库基准测试工具,广泛应用于数据库性能评估与优化。版本4.1.1是其一个重要里程碑,为开发者和数据库管理员提供了全面的数据库性能...

    CIS_Red_Hat_Enterprise_Linux_8_Benchmark_v1.0.1.pdf

    CIS_Red_Hat_Enterprise_Linux_8_Benchmark_v1.0.1.pdf

    core.matrix.benchmark:用于core.matrix实现的基准测试套件

    core.matrix.benchmark 用于core.matrix实现的基准测试套件基本原理core.matrix支持多种实现。 此仓库包含一系列基准,可用于分析这些实现之间的性能差异。用法去做也可以看看:

    hash_benchmark:Go编写的一个小型哈希基准测试

    A small hash benchmark written by go. Some test result 2018 15' MacBook Pro (i7-8750H) md5 用时: 2.173 s sha1 用时: 1.481 s sha224 用时: 3.851 s sha256 用时: 3.985 s sha384 用时: 2.606 s sha512 用时: ...

    CIS_CentOS_Linux_7_Benchmark_v3.1.1.pdf

    CIS_CentOS_Linux_7_Benchmark_v3.1.1.pdf文件是关于如何安全配置CentOS Linux 7系统的指南。这份文档由Center for Internet Security(CIS)发布,旨在为用户提供一套标准化的安全基准测试,以帮助管理员加强系统...

    hash-benchmark:在Node.js中对哈希库进行基准测试

    标题中的“hash-benchmark”指的是一个项目或工具,用于在Node.js环境中对比和评估不同的哈希库性能。哈希库通常包含各种哈希函数,它们能够将任意大小的数据转化为固定长度的唯一标识(哈希值),广泛应用于数据...

    CIS_Kubernetes_Benchmark_v1.6.0.pdf

    ### CIS Kubernetes Benchmark v1.6.0:Kubernetes 安全合规检查 #### 概述 CIS(Center for Internet Security)Kubernetes基准是为Kubernetes集群的安全配置提供指导的一套标准,它由社区专家根据最佳实践制定而...

    benchmarksql5.0_kingbase.tar.gz

    简单点说benchmarksql就是一个通过JDBC 测试OLTP 的TPC-C。 支持PostgreSQL/EnterpriseDB, DB2, Oracle, SQLSvr, MySQL,DM,Kingbase

    【GNN综述_2020_8】Heterogeneous Network Representation Learning: ...

    Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark 由于现实世界中的对象及其交互通常是多模态和多类型的,因此异构网络被广泛用作传统同构网络(图)的更强大、更...

    How to Benchmark Your Linux System.mp4

    How to Benchmark Your Linux System.mp4How to Benchmark Your Linux System.mp4How to Benchmark Your Linux System.mp4

    Web应用安全:Mysql盲注.pptx

    2. **MySQL盲注类型** - **基于布尔的SQL盲注**:通过构造包含逻辑判断的SQL语句,例如`IF`或`BETWEEN`,根据页面是否发生明显变化来判断条件的真假。 - **基于时间的SQL盲注**:利用`SLEEP()`或`BENCHMARK()`函数...

    AS_SSD_Benchmark_v2.073.zip

    AS_SSD_Benchmark_v2.073.zip 是一个用于测试和评估固态硬盘(SSD)性能的软件工具的压缩包。这个工具的主要目的是帮助用户检查他们的硬盘是否已正确执行了4K对齐,这是一个对于SSD优化至关重要的设置。在深入探讨这...

Global site tag (gtag.js) - Google Analytics