嵌入式数据库性能对比

wjason

浏览: 906146 次
性别:
来自: 大连

最近访客更多访客>>

guanghuiqq

kincieng

lyflyy

zc5510670

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Database

嵌入式 HSQLDB MySQL Derby JDBC

ps. 1. 转载过程中省略了图片，需要者可以参看原链接

?? 2. 给出的这个链接也是转载

Performance Benchmarking of Embedded Databases

update: I’ve update some of the results to include HSQLDB’s CACHED tables.

Introduction

As part of my PhD research I am developing a fairly complex simulation of pedestrian movement. Well, it’s at least moderately complex, particularly when scaled up to tens of thousands of people! I’ve have developed the simulation with MySQL as a backend, serving both to provide the input data (overall configuration, street maps, routes, agent preferences, etc) and collect the output data (basically an event is generated whenever two pedestrians walk past each other). Further overviews on my research can be found at in the research section

MySQL was initially chosen because I had some experience with it via PHP, version 4.1 supports OpenGIS definitions (which, for example, allows me to query for all streets within a particular area) and it has plenty of formal and informal support (partiularly documentation and management tools). In order to explore the parameter space of my simulation I need to deploy it on the University’s research cluster. However, accessing a remote database, particularly for such frequent input/output, was not a plausible or efficient situation. The easiest solution was to replace MySQL with a pure-Java, embedded database. Embedded databases eschew the client-server architecture of a mainstream database (such as Oracle, Postgres, MySQL) and instead execute within the JVM and store their data in local files. Most embedded databases still use the standard JDBC interface through which traditional client-server databases are accessed. Thus, you should be able to change the database drivers (and possibly make minors modifications to the SQL) and everything would work just like with MySQL—but faster and with no network requirements. In reality, it isn’t quite that simple.

It seemed like a wise step to spend a short time comparing the available solutions which is what this article is all about. Please note that, although I’m a Java programmer for the past 8 years, I am not a database expert. I make no claims to the accuracy of this article. Please don’t simply scan the graphs and read the conclusions without understanding what I have benchmarked! Also, I wouldn’t try to generalise the results too much—it’s always best to benchmark your own candidates with a representative usecase from your application. I supply the benchmark source code for you to peruse and adapt as you require.

Candidate Databases

So, based on Google, Java-source and general knowledge, I went in search of a pure Java embedded database which would fulfil my criteria:

Free for non-commercial/educational use and/or open source (well, I am a student!)
Save to local files (which should be documented so I can retrieve them from the remote machines)
Support JDBC and act as a relational database
Support auto-incremented integer columns
Be fast and small with minimal external configuration
INSERT operations are probably more important than SELECTs so optimisation there would be appreciated
Good documentation
Recently updated and under active development

Probably the most widely used embedded database is Sleepycat’s BerkeleyDB, of which they now offer a pure Java version. Personally I’m a bit of a fan of this database but it uses a record structure, not a relational one, and therefore it doesn’t support JDBC. I’ve found three other possible candidates: HSQLDB, Derby (previously IBM’s Cloudscape) and Daffodil’s One$DB. I give a very brief overvew of each database below but the primary focus of this article is performance benchmarking not feature comparison.

MySQL

For the sake of completeness, I include MySQL here as it is the database I’m migrating from.
MySQL is a popular cross-platform open-source database with extensive documentation, books and tools. MySQL is a native application, which is accessed in a client-server fashion, and is widely used in combination with PHP and Apache for web applications.

HSQLDB

HSQLDB, previously known as HypersonicDB, is a mature Java embedded database which has recently found favour with the OpenOffice.org team and it will be integrated with their forthcoming database office application, Base. HSQLDB is often used in combination with Hibernate. The website for HSQLDB is fairly plain, but easy to navigate, and the documentation appears to be quite comprehensive. There are a few tools supplied with HSQLDB, including a database browser, but several other tools also support this database (including the useful and attractive DbVisualiser)

Derby

Derby (previously known as Cloudscape) was recently open-sourced by IBM and contributed to the Apache project. As such, it is a mature product but a relatively unknown quantity to most developers. The website is clean and there is a good level of useful documentation available. There are a few tools provided with the distribution including a rudimentary viewer and a command -line interface.

Daffodi One$DB

One$DB is the open-source version of DaffodilDB but retains most of it’s features. One$DB was open sourced in December 2004 and can be embedded into your application or in a typical client-server database. One$DB is supplied with a database browser (although you can also use DbVisualiser) and a good selection of well-presented documentation (although their SQL reference could use some examples). Daffodil can also supply an ODBC driver and database replicator which work with One$DB.

Method

My primary goals were to perform the same tests on each database using the default setup for each system. To this end I’ve written a class, unimaginatively called Benchmark, which creates a table, INSERTs some rows and then reads them back using a SELECT statement. Originally, I had intended to use the same SQL for all four databases but there were enough differences that each database now has its own SQL statements. I was concerned that many of the embedded database would be doing some heavy caching. To exclude this possibility, the database connection is closed after each operation:

start timer 1 open database DROP the table if it exists and (re-)CREATE the table close the database stop timer 1 start timer 2 open the database INSERT n rows using a PreparedStatement close the database stop timer 2 start timer 3 open the database SELECT all rows using a PreparedStatement close the database stop timer 3

This is an example of the table used in the benchmark. It is a simple combination of popular column types which I shall require in my work:
CREATE TABLE test ( id INTEGER GENERATED BY DEFAULT AS IDENTITY, name VARCHAR( 254 ) NOT NULL, value INT NOT NULL, date DATE NOT NULL, longnumber BIGINT NOT NULL, floatnumber FLOAT NOT NULL, PRIMARY KEY ( id ))

Implementation

I won’t write too much on the actual implementation of the benchmarks since its pretty simple and boring. The whole suite (such as it is) is available for download [~6MB]. It contains the source code, all the required libraries and a Netbeans 4.0 project to build it with. Of course, since Netbeans actually uses ANT as a build environment you can just type ant jar it the main dbBenchmark directory (sorry for the blatant publisising but I’m a bit of a Netbeans 4 fan). There are a bunch of .bat files which may need to be modified for your environments but they should provide you with an understanding of the various command-line options.

Feel free to modify the benchmarks to suit your own requirements. I won’t dignify it by slapping an open-source licence on it but treat it as public domain code—although if you make any significant changes I’d be interested to hear about it in the comments below.

Usage

The benchmark is invoked from the command line using the following options:

usage: java com.ideasasylum.dbbenchmark.Benchmark

-I,—increment	Increment the select and inserts for load testing
-b,—benchmark	The class name of the benchmark to execute
-d,—database	The JDBC url of the database
-j,—driver	JDBC Driver
-n,—runs	The number of benchmark run to perform
-o,—output	The output file name (CSV format)
-p,—password	The database password
-r,—rows	The number of selects to perform (selects should be >= inserts)
-u,—username	The database username

For example, the following command will benchmark a Derby database using the Benchmark class com.ideasasylum.dbbenchmark.DerbyBenchmark, the driver org.apache.derby.jdbc.EmbeddedDriver and the database jdbc:derby:derbytest;create=true. It will perform 10 runs, starting with 5000 inserts and incrementing this by 5000 each time (so the last run will be inserting and retrieving 50000 rows). The output is sent to a comma-seperated file, derbyload.csv.

java -classpath dist/dbBenchmark.jar;lib/derby.jar;lib/commons-cli-1.0.jar com.ideasasylum.dbbenchmark.Benchmark -b com.ideasasylum.dbbenchmark.DerbyBenchmark -j org.apache.derby.jdbc.EmbeddedDriver -d jdbc:derby:derbytest;create=true -n 10 -r 5000 -o derbyload.csv --increment=5000

Configuration

Name	Version	Driver
MySQL	4.1.7-nt	com.mysql.jdbc.Driver (version 3.1.7)
HSQLDB	1.7.3	org.hsqldb.jdbcDriver
Derby	10.0.2.1	org.apache.derby.jdbc.EmbeddedDriver
One$DB	4.0	in.co.daffodil.db.jdbc.DaffodilDBDriver

All tests were performed using Java 5 on a 2.6GHz P4, 1GB RAM and Windows XP. The machine was lightly loaded (e.g. email client etc). The results from each run are output into a CSV file. An Excel spreadsheet links in these files and plots a few graphs, calculates some averages etc. The data supplied here is not definitive but it is enough to provide a quick impression of the speed of each database.

During the course of the experiments it became clear that HSQLDB was unbelievably fast. Too fast. Upon checking the documentation, I discovered that HSQLDB should be shutdown by sending it the SQL command, SHUTDOWN. It doesn’t shutdown properly if connection.close() is used although, since it is a reliable database, no data loss appears to occur. I’ve included a seperate series called “HSQLDB Shutdown” which benchmarks HSQLDB when it is shutdown properly. Update: As someone pointed out, by default HSQLDB creates in-memory tables unless you use CREATE CACHED TABLE. I’ve added two more columns which show the results when HSQLDB is run using CACHED tables and when SHUTDOWN properly.

Results

Drop/Create Performance

Rows	MySQL	HSQLDB	Derby	Daffodil	HSQLDB Shutdown	HSQLDB Cached	HSQLDB Cached Shutdown
5000	532	3844	7625	4172	906	938	624
10000	141	0	454	750	375	16	219
15000	94	0	390	735	531	0	219
20000	125	0	344	703	750	0	203
25000	94	16	360	688	938	0	219
30000	94	0	406	704	1172	16	235
35000	93	0	375	703	1328	15	437
40000	110	0	359	672	1563	15	281
45000	94	0	297	688	1750	15	250
50000	110	0	344	703	1922	16	188

Insertion Performance

Rows	MySQL	HSQLDB	Derby	Daffodil	HSQLDB Shutdown	HSQLDB Cached	HSQLDB Cached Shutdown
5000	113527	312	7078	4781	766	765	1860
10000	227511	422	10374	7156	890	625	1609
15000	332400	672	184	10531	1250	829	1984
20000	433583	890	21687	17499	1625	1110	2874
25000	540303	1094	35968	18765	1937	1375	3624
30000	671339	1313	32125	21733	2406	2531	4187
35000	792639	1500	37531	25156	2687	2375	4250
40000	934574	1750	42984	28827	3109	3188	5187
45000	1048950	1953	48499	32171	3469	2797	4828
50000	1246853	2140	53780	35093	3796	4547	4843

Selection Performance

Rows	MySQL	HSQLDB	Derby	Daffodil	HSQLDB Shutdown	HSQLDB Cached	HSQLDB Cached Shutdown
5000	297	63	187	406	594	63	390
10000	265	31	188	359	891	46	328
15000	391	47	110	422	1328	46	391
20000	421	47	140	516	1734	62	516
25000	531	62	156	0	2016	63	703
30000	579	62	187	782	2469	266	859
35000	0	203	203	719	2828	282	922
40000	828	109	390	782	3125	344	968
45000	843	110	453	891	3484	125	1062
50000	921	125	375	1031	3813	984	1141

Other Data

MySQL displays very low CPU utilisation (~2% for the benchmark application and 5% for the mysql server) which probably indicates that the bottleneck is the client-server I/O (which is to be expected).
One$DB and HSQLDB had a very high CPU utilisation (>95% measured using the Windows XP Task Manager). Again, this was expected since the databases are integrated into the application and the only bottleneck is how fast it can process the data.
Derby had a lower CPU usage than the other embedded databases (~40-70%)

分享到：

Struts vs webwork 中文 zz | Eclipse tomcat项目 debug状态时无法跟踪代 ...

2008-03-19 09:12
浏览 3247
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论