`
yangzb
  • 浏览: 3510601 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

HA JDBC – High Availability JDBC

阅读更多

Some time ago I worked on a project which was in need of a way to cluster databases. For those of you who don’t exactly know what database clustering is: database clustering is a way to have multiple databases work together to act like a single database. A cluster of databases typically has the following benefits:

  • A cluster has a query throughput that is much higher than just a single database.
  • A database in the cluster can fail without losing access to the data.

The result is that you now have access to a ‘database’ that is never down and can handle a lot more queries than a single database could. We didn’t care much about the higher query throughput but were very interested in a database that would always be available.

After some research I found out that there were a lot of products available to accomplish this.

We decided to use HA-JDBC (= High Availability Java DataBase Connectivity), an open source Java framework that offers database clustering. In this blog entry, I will tell you about my experiences with this relatively unknown framework, and hopefully share experiences with people who also use it. But first, I will try to explain how this framework does its job.

HOW HA-JDBC WORKS

Normally when you connect to a database, you use a JDBC connection. HA-JDBC wraps one or more connections and acts like a proxy in your javacode. This means that your javacode interacts with the proxy and is transparently communicating with multiple databases.

In order for HA-JDBC to know which connections to proxy, it needs an XML configuration file. This file defines how the cluster is configured. It defines the information needed to connect to the databases, but also the behaviour of the cluster itself. Below you can read about the experiences I’ve had with a few different aspects of HA-JDBC.

 

SYNCHRONIZATION

One aspect of HA-JDBC is synchronization. Synchronization in HA-JDBC is the process of making an out-of-date database up-to-date again by comparing its data with other databases in the cluster. A database can get out-of-date when it gets turned off for some reason. It could have died on its own or someone could have turned it off on purpose, but in any case: it needs synchronization to have its data up-to-date again. When a database goes down, users of the application will not notice. They can continue using the application like nothing is going on because other databases of the cluster are still available.

When a database is down, it will rapidly get out-of-date because it will not process any updates anymore. When the database is started again, HA-JDBC will pick this up and executes a synchronization strategy that is configured in the XML configuration file. The downside to this is that the built-in strategies of HA-JDBC are not very efficient. These are the ones packaged with HA-JDBC:

  • <!-- [if !supportLists]--><!-- [endif]-->FullSynchronizationStrategy : Deletes the content of all tables of the database that is being updated and fills them again with data from a different (up-to-date) database in the cluster.
  • <!-- [if !supportLists]-->DifferentialSynchronizationStrategy: Compares all rows of the out-of-date database with a different (up-to-date) database in the cluster to find out which rows need to be updated, inserted or deleted.
  • <!-- [if !supportLists]-->PassiveSynchronizationStrategy : There also is a strategy that assumes no updates have taken place during the down-time and is therefore doing nothing.

A nice thing about HA-JDBC is that you can implement your own synchronization strategy. Since the above mentioned strategies will take hours to complete on large tables we decided to write a strategy ourselves. This strategy requires tables to have a timestamp for versioning, does not support deletes but turned out to be a lot faster than the built-in strategies of HA-JDBC.

 

 

ID GENERATION

When using a single database you can let the database itself generate IDs for records you insert. When using multiple databases with HA-JDBC you can still do that, but there is no guarantee that all databases in the cluster will generate the same ID. When a different ID is generated in each database, this will leave your cluster in an invalid state because now all the databases in this cluster contain different data.

Of course there is a solution to this problem but this isn’t pretty. When using an ORM tool like Hibernate , you can specify a generator for the ID field. By default Hibernate makes the database responsible for generating IDs which is not what we want. When using HA-JDBC you should use one of the following generators:

  • UUID-generator
  • HiLo-generator.

These two generators both don’t depend on an individual database, which is just what we need, but they do not produce normal IDs. For example the UUID generator generates IDs like ‘4028828d-0dc7f2a2-010d-c7f2a4d3-0013’. This value is based on the current timestamp and the IP address of the machine the application is on. The HiLo generator generates normal numbers but they don’t increase like you’re used to. It is possible that the first number it generates is 432 and the next one is 33200, which you wouldn’t expect from IDs.

NON-INTRUSIVE?

When I started using HA-JDBC, I expected it to be non-intrusive to our project. Because the only thing we needed was to change our JDBC driver and write a simple XML configuration file for it. But as you have read in this blog entry, you first of all need to write your own strategy for synchronization, and second of all, you probably have to switch to UUIDs. This requires a lot of refactoring all over your code because you are now switching from Long typed IDs to String typed IDs. So in fact it does influence your project more than you would expect.

CONCLUSION

In conclusion, HA-JDBC is very easy to set up and has well written documentation on its website. It performs quite well, especially when writing a customized strategy for synchronization. Since it delegates calls to underlying JDBC drivers directly, it is fast and has full JDBC support. You also don’t need anything else then just your database servers and your application servers. But there are a few issues with HA-JDBC that are a bit annoying, you will probably end up with having UUIDs for records in your database and having to write your own synchronization strategy.

I was wondering if there were any other people that have some experience with this database clustering approach and would like to share their experiences. So if you have any experience with HA-JDBC, don’t hesitate to leave a comment!

分享到:
评论

相关推荐

    ha-jdbc.rar

    【ha-jdbc.rar】是一个压缩包文件,其中包含的【ha-jdbc】 jar包是针对Java平台的一个数据库连接工具,主要用于实现高可用性(High Availability)和负载均衡(Load Balancing)的Java Database Connectivity(JDBC...

    ha-jdbc入门demo

    【ha-jdbc入门demo】是针对高可用性(High Availability, HA)数据库连接技术的一次实践,主要聚焦在如何利用ha-jdbc实现数据库的高可用和负载均衡。在这个入门示例中,我们将深入理解ha-jdbc的工作原理,以及如何...

    Oracle High Availability, Disaster Recovery, and Cloud Services

    高可用性(High Availability, HA)是确保系统在面临硬件故障、软件错误或其他中断时仍能持续运行的关键特性。Oracle提供了多种HA解决方案,如Real Application Clusters (RAC),它允许多台服务器共享同一个数据库...

    HA-JDBC:高可用性JDBC-开源

    HA-JDBC,全称High Availability JDBC,它通过在原有JDBC驱动之上增加一层中间件,实现了对数据库连接的管理和监控,确保在分布式环境中应用的稳定性和数据的一致性。 描述中提到,HA-JDBC能够为任何基础JDBC驱动...

    搭建Flink standalone HA 模式所需的jar包

    - `jobmanager.high-availability`:设置为`zookeeper`,表明使用ZooKeeper进行高可用协调。 - `high-availability.zookeeper.quorum`:指定ZooKeeper集群的地址。 - `state.backend`:选择状态后端,如`rocksdb`...

    mysql router vs maxscale

    - MHA HA:Master High Availability,是提高数据库主从复制高可用性的工具。 - MySQL Fabric:是一个用于管理和伸缩MySQL数据库架构的组件,已停止开发。 - MariaDB Replication Manager (MRM):是MariaDB的复制...

    高可用集群利器Keepalived架设网站与mysql集群详细版

    在IT行业中,构建高可用性(High Availability, HA)系统是确保服务连续性和稳定性的重要手段。Keepalived作为一款开源的高可用性工具,广泛应用于网站和数据库集群中,为业务提供持续的服务保障。本篇将详细介绍...

    大数据离线分析系统,基于hadoop的hive以及sqoop的安装和配置

    首先,我们来看Hadoop HA(High Availability)集群的安装。Hadoop HA提供了一种高可用性解决方案,确保即使主NameNode故障,系统也能继续运行。在`hadoop HA集群安装文档1.0.docx`中,应详细介绍了如何配置两个...

    helloworld-jsp-3.0.4.zip

    H2H可能是H2 High Availability的缩写,指的是H2数据库的一个高可用性解决方案。H2是一个开源的、高性能的关系型数据库管理系统,广泛用于开发和测试环境中。JDBC代理驱动程序则是一个中间件,它能够透明地在多个...

    基于差分GPS的机场道面巡检系统的设计与实现.pdf

    系统设计方面,机场道面巡检系统采用了B/S架构,运用Flex富客户端技术、高精度GPS定位技术、J2EE技术、中间件技术、数据库HA(High Availability,高可用性)以及地图切片缓存等技术。Apache Tomcat作为中间件部署在...

    MySQL高可用的最佳应用与实践.docx

    VIP漂移是基于HA同步软件,如MHA(Master High Availability)和MMM(Master-Master Replication Manager for MySQL),它们监控MySQL状态并在主节点故障时自动切换VIP。API调用方式则更灵活,允许应用程序直接控制...

    hive等配置文档

    在实际部署中,还需要考虑高可用性、安全性、性能优化等多个方面,例如设置Hadoop的HA(High Availability)、Hive的分区策略、HBase的Region拆分策略等。在大数据环境中,正确配置和优化这些组件至关重要,它们直接...

    华为FusionInsight LibrA方案白皮书.pdf

    - 无单点故障:所有组件都支持HA(High Availability),包括数据节点HA和协调节点多活,以及GTM全局事务节点HA。 - 在线扩容:基于Node Group技术,可以在不影响业务的情况下进行扩容,支持数据操作和DDL操作。 ...

    ITeye新闻月刊 - 2017年10月

    高可用(High Availability,简称HA)意味着通过设计减少系统的停机时间,提高服务的持续性和稳定性。文章可能涉及到了如何设计高可用架构、常用的HA策略和解决方案,以及如何在分布式系统中实施。对于从事互联网...

    罗立树-数据库中间件架构设计和实现细节-PPT课件.ppt

    \n\n**高可用性(HA)**:HA是指通过冗余硬件或软件设计,确保即使在部分组件故障时,系统仍能持续提供服务。数据库中间件通常会包含故障切换机制,当主库出现问题时,能快速切换到备用库,保持服务不间断。\n\n**分库...

    架构脑图.pdf

    - **HA-HDFS介绍**:HDFS高可用性(High Availability)的介绍。 - **HA集群搭建**:搭建HDFS HA集群。 - **HDFS-开发环境搭建**:搭建HDFS开发环境。 - **HDFS-开发API讲解**:HDFS提供的API介绍。 #### FastDFS ...

    双活:JetBrains Academy双活项目的源代码:https:hyperskill.orgprojects120

    【标题】中的“双活”通常在IT领域指的是高可用性(High Availability, HA)的架构设计,特别是在数据库或存储系统中,通过在两个不同的地理位置同时运行相同的应用或服务,确保即使在一个站点出现故障时,另一个站点...

    tomcat-8_API

    * catalina-ha.jar (High availability package) * catalina-storeconfig.jar (Generation of XML configuration from current state) * catalina-tribes.jar (Group communication) * ecj-4.4.jar (Eclipse JDT ...

    tomcat-7_API_帮助文档

    * catalina-ha.jar (High availability package) * catalina-tribes.jar (Group communication) * ecj-4.4.jar (Eclipse JDT Java compiler) * el-api.jar (EL 2.2 API) * jasper.jar (Jasper 2 Compiler and ...

    《大数据平台搭建与配置管理》期中试题试卷及答案.docx

    - **知识点**:High Availability (HA)产生的背景。 - **详细解析**:HA机制的出现是为了提高系统的可用性,确保在发生故障时能够快速切换,保持服务的连续性。 39. **网络管理任务** - **知识点**:网络管理的...

Global site tag (gtag.js) - Google Analytics