- 浏览: 901403 次
- 性别:
- 来自: 大连
文章分类
- 全部博客 (319)
- Thinking / 反思 (27)
- 我读的技术类图书 (3)
- 我读的非技术图书 (3)
- Java & Groovy (55)
- Ruby/Rails (9)
- Python (10)
- C/C++ (14)
- C# & .net (9)
- 互联网相关技术 (6)
- Database (6)
- Unix/Linux (6)
- WindowsDev (21)
- 工具使用 / Tips (62)
- 编程技术杂谈/咨讯 (6)
- 软工 / 敏捷 / 模式 (6)
- 易筋经 / 各种内功 (3)
- 充电 / 他学科知识 (6)
- 外语学习 (16)
- 我和宝宝的甜蜜生活 (24)
- 八卦 (3)
- 健康 (0)
- 无类别 (0)
- mTogether (4)
- 一页纸 (3)
- SAP (7)
- baby (2)
- abap (2)
- temp (1)
- network (1)
- 生活 (1)
最新评论
-
daliang1215:
收藏一下,好东西。 xp 的快捷键用的非常爽,到win7缺没有 ...
Windows7: 右键任务栏上的一个窗口, 用快捷键c关闭它 -
Alice南京:
感谢
Java GC 监视方法与工具 -
wjason:
今天在excel 2010上面写了一些代码,果然lookup有 ...
Excel 公式: 根据一个单元格的用户输入值, 自动设置另一个单元格的值 -
wjason:
因式分解:http://zh.wikipedia.org/wi ...
教孩子学编程: 数学题1 -
bbls:
不错 找了好久了
VS2010: 在Solution Explorer中,自动关联当前正在编辑的文件
Performance Benchmarking of Embedded Databases
update: I’ve update some of the results to include HSQLDB’s CACHED tables.
Introduction
As part of my PhD research I am developing a fairly complex simulation of pedestrian movement. Well, it’s at least moderately complex, particularly when scaled up to tens of thousands of people! I’ve have developed the simulation with MySQL as a backend, serving both to provide the input data (overall configuration, street maps, routes, agent preferences, etc) and collect the output data (basically an event is generated whenever two pedestrians walk past each other). Further overviews on my research can be found at in the research section
MySQL was initially chosen because I had some experience with it via PHP, version 4.1 supports OpenGIS definitions (which, for example, allows me to query for all streets within a particular area) and it has plenty of formal and informal support (partiularly documentation and management tools). In order to explore the parameter space of my simulation I need to deploy it on the University’s research cluster. However, accessing a remote database, particularly for such frequent input/output, was not a plausible or efficient situation. The easiest solution was to replace MySQL with a pure-Java, embedded database. Embedded databases eschew the client-server architecture of a mainstream database (such as Oracle, Postgres, MySQL) and instead execute within the JVM and store their data in local files. Most embedded databases still use the standard JDBC interface through which traditional client-server databases are accessed. Thus, you should be able to change the database drivers (and possibly make minors modifications to the SQL) and everything would work just like with MySQL—but faster and with no network requirements. In reality, it isn’t quite that simple.
It seemed like a wise step to spend a short time comparing the available solutions which is what this article is all about. Please note that, although I’m a Java programmer for the past 8 years, I am not a database expert. I make no claims to the accuracy of this article. Please don’t simply scan the graphs and read the conclusions without understanding what I have benchmarked! Also, I wouldn’t try to generalise the results too much—it’s always best to benchmark your own candidates with a representative usecase from your application. I supply the benchmark source code for you to peruse and adapt as you require.
Candidate Databases
So, based on Google, Java-source and general knowledge, I went in search of a pure Java embedded database which would fulfil my criteria:
- Free for non-commercial/educational use and/or open source (well, I am a student!)
- Save to local files (which should be documented so I can retrieve them from the remote machines)
- Support JDBC and act as a relational database
- Support auto-incremented integer columns
- Be fast and small with minimal external configuration
- INSERT operations are probably more important than SELECTs so optimisation there would be appreciated
- Good documentation
- Recently updated and under active development
Probably the most widely used embedded database is Sleepycat’s BerkeleyDB, of which they now offer a pure Java version. Personally I’m a bit of a fan of this database but it uses a record structure, not a relational one, and therefore it doesn’t support JDBC. I’ve found three other possible candidates: HSQLDB, Derby (previously IBM’s Cloudscape) and Daffodil’s One$DB. I give a very brief overvew of each database below but the primary focus of this article is performance benchmarking not feature comparison.
MySQL
For the sake of completeness, I include MySQL here as it is the database I’m
migrating from.
MySQL is a popular cross-platform open-source database with
extensive documentation, books and tools. MySQL is a native
application, which is accessed in a client-server fashion, and is
widely used in combination with PHP and Apache for web
applications.
HSQLDB
HSQLDB, previously known as HypersonicDB, is a mature Java embedded database which has recently found favour with the OpenOffice.org team and it will be integrated with their forthcoming database office application, Base. HSQLDB is often used in combination with Hibernate. The website for HSQLDB is fairly plain, but easy to navigate, and the documentation appears to be quite comprehensive. There are a few tools supplied with HSQLDB, including a database browser, but several other tools also support this database (including the useful and attractive DbVisualiser)
Derby
Derby (previously known as Cloudscape) was recently open-sourced by IBM and contributed to the Apache project. As such, it is a mature product but a relatively unknown quantity to most developers. The website is clean and there is a good level of useful documentation available. There are a few tools provided with the distribution including a rudimentary viewer and a command -line interface.
Daffodi One$DB
One$DB is the open-source version of DaffodilDB but retains most of it’s features. One$DB was open sourced in December 2004 and can be embedded into your application or in a typical client-server database. One$DB is supplied with a database browser (although you can also use DbVisualiser) and a good selection of well-presented documentation (although their SQL reference could use some examples). Daffodil can also supply an ODBC driver and database replicator which work with One$DB.
Method
My primary goals were to perform the same tests on each database using the default setup for each system. To this end I’ve written a class, unimaginatively called Benchmark, which creates a table, INSERTs some rows and then reads them back using a SELECT statement. Originally, I had intended to use the same SQL for all four databases but there were enough differences that each database now has its own SQL statements. I was concerned that many of the embedded database would be doing some heavy caching. To exclude this possibility, the database connection is closed after each operation:
start timer 1
open database
DROP the table if it exists and (re-)CREATE the table
close the database
stop timer 1
start timer 2
open the database
INSERT n rows using a PreparedStatement
close the database
stop timer 2
start timer 3
open the database
SELECT all rows using a PreparedStatement
close the database
stop timer 3
This is an example of the table used in the benchmark. It is a
simple combination of popular column types which I shall require in
my work:
CREATE TABLE test (
id INTEGER GENERATED BY DEFAULT AS IDENTITY,
name VARCHAR( 254 ) NOT NULL,
value INT NOT NULL,
date DATE NOT NULL,
longnumber BIGINT NOT NULL,
floatnumber FLOAT NOT NULL,
PRIMARY KEY ( id ))
Implementation
I won’t write too much on the actual implementation of the
benchmarks since its pretty simple and boring. The whole suite
(such as it is) is available
for download [~6MB]. It contains the source code, all the
required libraries and a Netbeans 4.0 project to build it with. Of
course, since Netbeans actually uses ANT as a build environment you
can just type ant jar
it
the main dbBenchmark directory (sorry for the blatant publisising
but I’m a bit of a Netbeans 4 fan). There are a bunch of .bat
files which may need to be modified for your environments but they
should provide you with an understanding of the various
command-line options.
Feel free to modify the benchmarks to suit your own requirements. I won’t dignify it by slapping an open-source licence on it but treat it as public domain code—although if you make any significant changes I’d be interested to hear about it in the comments below.
Usage
The benchmark is invoked from the command line using the following options:
usage: java
com.ideasasylum.dbbenchmark.Benchmark
-I,—increment | Increment the select and inserts for load testing |
-b,—benchmark | The class name of the benchmark to execute |
-d,—database | The JDBC url of the database |
-j,—driver | JDBC Driver |
-n,—runs | The number of benchmark run to perform |
-o,—output | The output file name (CSV format) |
-p,—password | The database password |
-r,—rows | The number of selects to perform (selects should be >= inserts) |
-u,—username | The database username |
For example, the following command will benchmark a Derby
database using the Benchmark class com.ideasasylum.dbbenchmark.DerbyBenchmark
,
the driver org.apache.derby.jdbc.EmbeddedDriver
and the
database jdbc:derby:derbytest;create=true
. It will
perform 10 runs, starting with 5000 inserts and incrementing this
by 5000 each time (so the last run will be inserting and retrieving
50000 rows). The output is sent to a comma-seperated file,
derbyload.csv
.
java -classpath
dist/dbBenchmark.jar;lib/derby.jar;lib/commons-cli-1.0.jar
com.ideasasylum.dbbenchmark.Benchmark -b
com.ideasasylum.dbbenchmark.DerbyBenchmark -j
org.apache.derby.jdbc.EmbeddedDriver -d
jdbc:derby:derbytest;create=true -n 10 -r 5000 -o derbyload.csv
--increment=5000
Configuration
Name | Version | Driver |
MySQL | 4.1.7-nt | com.mysql.jdbc.Driver (version 3.1.7) |
HSQLDB | 1.7.3 | org.hsqldb.jdbcDriver |
Derby | 10.0.2.1 | org.apache.derby.jdbc.EmbeddedDriver |
One$DB | 4.0 | in.co.daffodil.db.jdbc.DaffodilDBDriver |
All tests were performed using Java 5 on a 2.6GHz P4, 1GB RAM and Windows XP. The machine was lightly loaded (e.g. email client etc). The results from each run are output into a CSV file. An Excel spreadsheet links in these files and plots a few graphs, calculates some averages etc. The data supplied here is not definitive but it is enough to provide a quick impression of the speed of each database.
During the course of the experiments it became clear that HSQLDB was unbelievably fast. Too fast. Upon checking the documentation, I discovered that HSQLDB should be shutdown by sending it the SQL command, SHUTDOWN. It doesn’t shutdown properly if connection.close() is used although, since it is a reliable database, no data loss appears to occur. I’ve included a seperate series called “HSQLDB Shutdown” which benchmarks HSQLDB when it is shutdown properly. Update: As someone pointed out, by default HSQLDB creates in-memory tables unless you use CREATE CACHED TABLE. I’ve added two more columns which show the results when HSQLDB is run using CACHED tables and when SHUTDOWN properly.
Results
Drop/Create Performance
Rows | MySQL | HSQLDB | Derby | Daffodil | HSQLDB Shutdown | HSQLDB Cached | HSQLDB Cached Shutdown |
5000 | 532 | 3844 | 7625 | 4172 | 906 | 938 | 624 |
10000 | 141 | 0 | 454 | 750 | 375 | 16 | 219 |
15000 | 94 | 0 | 390 | 735 | 531 | 0 | 219 |
20000 | 125 | 0 | 344 | 703 | 750 | 0 | 203 |
25000 | 94 | 16 | 360 | 688 | 938 | 0 | 219 |
30000 | 94 | 0 | 406 | 704 | 1172 | 16 | 235 |
35000 | 93 | 0 | 375 | 703 | 1328 | 15 | 437 |
40000 | 110 | 0 | 359 | 672 | 1563 | 15 | 281 |
45000 | 94 | 0 | 297 | 688 | 1750 | 15 | 250 |
50000 | 110 | 0 | 344 | 703 | 1922 | 16 | 188 |
Insertion Performance
Rows | MySQL | HSQLDB | Derby | Daffodil | HSQLDB Shutdown | HSQLDB Cached | HSQLDB Cached Shutdown |
5000 | 113527 | 312 | 7078 | 4781 | 766 | 765 | 1860 |
10000 | 227511 | 422 | 10374 | 7156 | 890 | 625 | 1609 |
15000 | 332400 | 672 | 184 | 10531 | 1250 | 829 | 1984 |
20000 | 433583 | 890 | 21687 | 17499 | 1625 | 1110 | 2874 |
25000 | 540303 | 1094 | 35968 | 18765 | 1937 | 1375 | 3624 |
30000 | 671339 | 1313 | 32125 | 21733 | 2406 | 2531 | 4187 |
35000 | 792639 | 1500 | 37531 | 25156 | 2687 | 2375 | 4250 |
40000 | 934574 | 1750 | 42984 | 28827 | 3109 | 3188 | 5187 |
45000 | 1048950 | 1953 | 48499 | 32171 | 3469 | 2797 | 4828 |
50000 | 1246853 | 2140 | 53780 | 35093 | 3796 | 4547 | 4843 |
Selection Performance
Rows | MySQL | HSQLDB | Derby | Daffodil | HSQLDB Shutdown | HSQLDB Cached | HSQLDB Cached Shutdown |
5000 | 297 | 63 | 187 | 406 | 594 | 63 | 390 |
10000 | 265 | 31 | 188 | 359 | 891 | 46 | 328 |
15000 | 391 | 47 | 110 | 422 | 1328 | 46 | 391 |
20000 | 421 | 47 | 140 | 516 | 1734 | 62 | 516 |
25000 | 531 | 62 | 156 | 0 | 2016 | 63 | 703 |
30000 | 579 | 62 | 187 | 782 | 2469 | 266 | 859 |
35000 | 0 | 203 | 203 | 719 | 2828 | 282 | 922 |
40000 | 828 | 109 | 390 | 782 | 3125 | 344 | 968 |
45000 | 843 | 110 | 453 | 891 | 3484 | 125 | 1062 |
50000 | 921 | 125 | 375 | 1031 | 3813 | 984 | 1141 |
Other Data
- MySQL displays very low CPU utilisation (~2% for the benchmark application and 5% for the mysql server) which probably indicates that the bottleneck is the client-server I/O (which is to be expected).
- One$DB and HSQLDB had a very high CPU utilisation (>95% measured using the Windows XP Task Manager). Again, this was expected since the databases are integrated into the application and the only bottleneck is how fast it can process the data.
- Derby had a lower CPU usage than the other embedded databases (~40-70%)
发表评论
-
Aqua Data Studio - Tips
2009-04-03 17:20 2008整理Aqua Data Studio的两个tips 1. ... -
Derby: Out Of Memory & Update
2009-03-31 08:55 2332下面是一堆流水帐 慎入: 一. 问题 Java GC 监 ... -
Aqua Data Studio操作mysql乱码问题
2009-02-09 11:12 6763上周末, 有一堆Mantis用户需要注册, 可惜Mantis的 ... -
Derby总结备忘
2008-12-31 15:24 1601我现在所在的项目组中,使用Derby,以前对derby的使用方 ... -
some books of oracle
2008-03-05 07:24 1794今天和俺们家宝宝聊了下oracle的书,上网搜了这么两篇帖子 ...
相关推荐
嵌入式数据库是一种专为在有限资源环境下运行而设计的数据库管理系统,常见于移动设备、物联网设备...在实际的毕业论文设计中,可能会详细探讨这些知识点,并通过实验验证和对比分析来展现嵌入式数据库设计的各个方面。
### 常用的嵌入式数据库比较 #### 一、Berkeley DB **技术特点:** 1. **开源性与高性能:**Berkeley DB是一个开放源代码的内嵌式数据库管理系统,它能够为应用程序提供高性能的数据管理服务。开发者只需要通过...
在选择数据库引擎的过程中,作者对比了多种嵌入式数据库,最终选择了SQLite。SQLite因其高效查询性能和优秀的存储空间优化能力,成为构建电子词典的理想选择。系统功能设计上,参考了金山词霸,选取了核心功能,包括...
嵌入式数据库的对比主要考虑因素包括性能、内存占用、API接口的易用性、跨平台兼容性以及是否支持SQL。在选择合适的嵌入式数据库时,开发人员需要根据项目的需求,如数据量、实时性要求、开发语言支持和许可证成本等...
### Sqlite和嵌入式数据库表基础 #### Sqlite简介 Sqlite是一种轻量级的数据库管理系统,由D. Richard Hipp于2000年开始开发。最初版本在2001年发布,到了2004年发布了重要的3.0版本,引入了新的数据文件格式和...
### SQLite嵌入式数据库的应用研究知识点详述 #### 一、引言与背景 - **嵌入式数据库系统概述**:随着技术的发展,嵌入式系统越来越多地被应用于日常生活中的各种设备,如智能手机、平板电脑、智能手表等。为了更好...
本文主要探讨了两种流行的嵌入式数据库——SQLite和BerkeleyDB,并对其特性进行了详细的比较。 ##### SQLite SQLite是一款用C语言编写的轻量级关系型数据库引擎,它支持大部分SQL标准,并且无需独立的服务器进程或...
它同样是一种嵌入式数据库,但与SQLite不同,Berkeley DB更侧重于高性能和高可靠性。 - **特点**: - **高性能**:Berkeley DB专为高性能设计,特别是在读写性能方面表现突出。 - **事务处理**:支持复杂的事务...
- **简介**:Oracle Berkeley DB 是一个高性能、工业级别的嵌入式数据库解决方案,它提供了一套完整的工具包用于构建各种类型的应用程序。 - **特性**: - **C-Library 模式**:以 C 语言库的形式提供,可以嵌入到...
LokiJS是一个专为JavaScript设计的轻量级、高效的嵌入式内存数据库。它主要用于客户端应用,特别是Web浏览器或Node.js环境中的数据管理。LokiJS的设计目标是提供快速、灵活的数据存储解决方案,无需依赖外部数据库...
本文主要探讨了基于嵌入式数据库SQLite的采煤机智能监控系统的设计与实现,详细分析了系统的功能设计、数据存储、故障处理及优化等方面的内容。 首先,SQLite作为一款轻量级的嵌入式数据库,被广泛应用于各种设备和...
SQLite的优势在于其占用内存少,运行速度快,特别是在对比MySQL和PostgreSQL等大型数据库时,其性能优势尤为明显。SQLite支持ACID事务,这意味着它保证了数据的一致性和完整性,支持视图、子查询和触发器。此外,...
**H2嵌入式数据库详解** H2数据库是一款开源、高性能、轻量级的Java数据库管理系统,它支持多种模式,包括单用户模式、服务器模式、内存模式和嵌入式模式。在嵌入式模式下,H2数据库可以直接作为应用程序的一部分...
嵌入式数据库MySQL作为一款广泛应用于Web应用程序的关系型数据库管理系统,提供了多种模糊查询方法以满足不同场景的需求。本文主要探讨了MySQL中的两种主要模糊查询方式:Like匹配和RegExp正则匹配,以及如何通过...
常用内存数据库的比较 内存数据库是一种新型的数据库管理系统,它将整个数据库放入内存中,以提高数据库的性能和响应速度。与传统的磁盘数据库相比,内存数据库具有很多优势,如快速的数据访问速度、低延迟、低成本...
【嵌入式数据库的基本概念】 嵌入式数据库,顾名思义,是设计用于嵌入到其他应用程序中使用的数据库管理系统。它们不需独立运行的数据库引擎,而是与应用程序紧密集成,通过应用程序直接调用API来处理数据。这种...
嵌入式数据库是一种针对特定应用场合,特别是在资源有限的嵌入式系统中使用的轻量级数据库系统。与传统的客户端/服务器(C/S)结构的关系型数据库(如MySQL)相比,嵌入式数据库更加紧凑,无需独立运行的数据库...
3. 其他常见嵌入式数据库对比 - **PostgreSQL**:虽然功能强大且免费,但其体积较大,不适合作为嵌入式数据库。 - **MySQL**:多用户支持,但商业应用可能需要许可费用,且在嵌入式场景中占用空间大。 - **mSQL**:...