HA JDBC – High Availability JDBC - 东写西读终见大海无量

yangzb

浏览: 3513606 次
性别:
来自: 北京

最近访客更多访客>>

lizhensan

morelily

magicfish1981

duquancool

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

HA JDBC – High Availability JDBC

博客分类：

Database

JDBC UP Hibernate XML Access

Some time ago I worked on a project which was in need of a way to cluster databases. For those of you who don’t exactly know what database clustering is: database clustering is a way to have multiple databases work together to act like a single database. A cluster of databases typically has the following benefits:

A cluster has a query throughput that is much higher than just a single database.
A database in the cluster can fail without losing access to the data.

The result is that you now have access to a ‘database’ that is never down and can handle a lot more queries than a single database could. We didn’t care much about the higher query throughput but were very interested in a database that would always be available.

After some research I found out that there were a lot of products available to accomplish this.

We decided to use HA-JDBC (= High Availability Java DataBase Connectivity), an open source Java framework that offers database clustering. In this blog entry, I will tell you about my experiences with this relatively unknown framework, and hopefully share experiences with people who also use it. But first, I will try to explain how this framework does its job.

HOW HA-JDBC WORKS

Normally when you connect to a database, you use a JDBC connection. HA-JDBC wraps one or more connections and acts like a proxy in your javacode. This means that your javacode interacts with the proxy and is transparently communicating with multiple databases.

In order for HA-JDBC to know which connections to proxy, it needs an XML configuration file. This file defines how the cluster is configured. It defines the information needed to connect to the databases, but also the behaviour of the cluster itself. Below you can read about the experiences I’ve had with a few different aspects of HA-JDBC.

SYNCHRONIZATION

One aspect of HA-JDBC is synchronization. Synchronization in HA-JDBC is the process of making an out-of-date database up-to-date again by comparing its data with other databases in the cluster. A database can get out-of-date when it gets turned off for some reason. It could have died on its own or someone could have turned it off on purpose, but in any case: it needs synchronization to have its data up-to-date again. When a database goes down, users of the application will not notice. They can continue using the application like nothing is going on because other databases of the cluster are still available.

When a database is down, it will rapidly get out-of-date because it will not process any updates anymore. When the database is started again, HA-JDBC will pick this up and executes a synchronization strategy that is configured in the XML configuration file. The downside to this is that the built-in strategies of HA-JDBC are not very efficient. These are the ones packaged with HA-JDBC:

FullSynchronizationStrategy : Deletes the content of all tables of the database that is being updated and fills them again with data from a different (up-to-date) database in the cluster.
DifferentialSynchronizationStrategy: Compares all rows of the out-of-date database with a different (up-to-date) database in the cluster to find out which rows need to be updated, inserted or deleted.
PassiveSynchronizationStrategy : There also is a strategy that assumes no updates have taken place during the down-time and is therefore doing nothing.

A nice thing about HA-JDBC is that you can implement your own synchronization strategy. Since the above mentioned strategies will take hours to complete on large tables we decided to write a strategy ourselves. This strategy requires tables to have a timestamp for versioning, does not support deletes but turned out to be a lot faster than the built-in strategies of HA-JDBC.

ID GENERATION

When using a single database you can let the database itself generate IDs for records you insert. When using multiple databases with HA-JDBC you can still do that, but there is no guarantee that all databases in the cluster will generate the same ID. When a different ID is generated in each database, this will leave your cluster in an invalid state because now all the databases in this cluster contain different data.

Of course there is a solution to this problem but this isn’t pretty. When using an ORM tool like Hibernate , you can specify a generator for the ID field. By default Hibernate makes the database responsible for generating IDs which is not what we want. When using HA-JDBC you should use one of the following generators:

UUID-generator
HiLo-generator.

These two generators both don’t depend on an individual database, which is just what we need, but they do not produce normal IDs. For example the UUID generator generates IDs like ‘4028828d-0dc7f2a2-010d-c7f2a4d3-0013’. This value is based on the current timestamp and the IP address of the machine the application is on. The HiLo generator generates normal numbers but they don’t increase like you’re used to. It is possible that the first number it generates is 432 and the next one is 33200, which you wouldn’t expect from IDs.

NON-INTRUSIVE?

When I started using HA-JDBC, I expected it to be non-intrusive to our project. Because the only thing we needed was to change our JDBC driver and write a simple XML configuration file for it. But as you have read in this blog entry, you first of all need to write your own strategy for synchronization, and second of all, you probably have to switch to UUIDs. This requires a lot of refactoring all over your code because you are now switching from Long typed IDs to String typed IDs. So in fact it does influence your project more than you would expect.

CONCLUSION

In conclusion, HA-JDBC is very easy to set up and has well written documentation on its website. It performs quite well, especially when writing a customized strategy for synchronization. Since it delegates calls to underlying JDBC drivers directly, it is fast and has full JDBC support. You also don’t need anything else then just your database servers and your application servers. But there are a few issues with HA-JDBC that are a bit annoying, you will probably end up with having UUIDs for records in your database and having to write your own synchronization strategy.

I was wondering if there were any other people that have some experience with this database clustering approach and would like to share their experiences. So if you have any experience with HA-JDBC, don’t hesitate to leave a comment!

分享到：

转：JGroups 简介、适用场合、配置、程序例 ... | SAN,NAS,DAS及其架构之间区别

2010-02-27 18:46
浏览 3316
评论(0)
分类:数据库
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

HA JDBC – High Availability JDBC

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

HA JDBC – High Availability JDBC

评论

发表评论

相关推荐

oracle复制表数据，复制表结构

删除 SQL Server 的所有已知实例

【SQL】安装 SQL SERVER MsiGetProductInfo 无法检索 Product Code 1605错误 解决方案

Mysql Using Master/Slave Replication with ReplicationConnection

oracle网络配置listener.ora、sqlnet.ora、tnsnames.ora

Oracle XE的数据库创建过程

实现数据库TPC性能测试的开源及商业软件

MySQL压力测试工具mysqlslap

一台机器上安装多个mysqld实例

MySQL数据库双向同步

MySQL 数据库之间的同步(windows与linux)

Ubuntu Server 下开启远程连接 MySQL

DB2 在REDHAT 5下的详细安装过程 DB2 9.5 C EXPRESS

IBM DB2 Express-C 9.5.2

Oracle Database 10g Express Edition安装小结

数据归档将走向何方

Database

免安装Oracle运行pl/sql developer

在debian上安装oracle 10g express

Oracle 数据库 10g 特别版：并非只适合初学者

最近访客更多访客>>

【SQL】安装 SQL SERVER MsiGetProductInfo 无法检索 Product Code 1605错误解决方案