`
wutao8818
  • 浏览: 616661 次
  • 性别: Icon_minigender_1
  • 来自: 杭州
社区版块
存档分类
最新评论

Sharding the Hibernate Way

阅读更多
http://highscalability.com/sharding-hibernate-way

To scale you are supposed to partition your data. Sounds good, but how do you do it? When you actually sit down to work out all the details it’s not that easy. Hibernate Shards to the rescue! Hibernate shards is: an extension to the core Hibernate product that adds facilities for horizontal partitioning. If you know the core Hibernate API you know the shards API. No learning curve at all. Here is what a few members of the core group had to say about the Hibernate Shards open source project. Although there are some limitations, from the sound of it they are doing useful stuff in the right way and it’s very much worth looking at, especially if you use Hibernate or some other ORM layer.


Information Sources


Google Developer Podcast Episode Six: The Hibernate Shards Open Source Project. This is the document summarized here.

Hibernate Shards Project Page

Hibernate Shards Dev Discussion Group.

Ryan Barrett’s Scaling on the Cheap presentation. Many of the lessons from here are in Hibernate Shards.


What is Hibernate Shards?


Shard: splitting up your data sets. If your data doesn't fit on one machine you split it up into pieces and each piece is called a shard.

Sharding: the process of splitting up data.

Sharding is used when you have too much data to fit in one single relational database. If your database has a JDBC adapter that means Hibernate can talk to it and if Hibernate can talk to it that means Hibernate Shards can talk to it.

Hibernate was chosen because it's a good ORM tool used internally at Google, but to Google Scale (really really big), sharding needed to be added because Hibernate didn’t support that sort of scale out of the box.

The learning curve for a Hibernate user is zero because the Hibernate API is the same. The shard implementation hasn’t violated the API (yet).

How does it compare to MySQL's horizontal partitioning? Shards is for situations where you have too much data to fit in a single database. MySQL partitioning may allow you to delay when you need to shard, but it is still a single database and you’ll eventually run into limits.


Schema Design for Shards


When sharding you have to consider the general issues of distributed data design for high data volumes. These aren’t Hibernate Shards specific issues, but are general to the problem space.

Schema design is the most important of the sharding process and you’ll have to do that up front.

You need to pick a dimension, a root level entity, that is easily sharded. Users and customers are common examples.

Accept the fact that those entities and all the entities that hang off those entities will be stored in separate physical spaces. Querying across different shards will be difficult. As will management and just about anything else you take for granted.

Control over how data are distributed is determined by a pluggable strategies layer.

Plan for the future by picking a strategy that will last you a long time. Repartitioning/resharding the data is operationally very difficult. No management tools for this yet.

Build simpler models that don't contain as many relationships because you don't have cross shard relationships. Your objects graphs should be contained on one shard as much as possible.

Lots of lots of objects pointing to each other may not be a good candidate for sharding.

Because the shards design doesn’t modify Hibernate core, you can design using shards from the start, even though you only have one database. Then when you need to start scaling it will be easier to grow.

Existing systems with shardable tables shouldn’t take very long to get up and running.


The Sharding Code’s Relationship to Hibernate


Shards doesn't have full support for Hibernate’s query interface. Hibernate has a criteria or a query interface. Criteria interface is robust, but not good for JPA (Java persistence API), which is query based.

Sharding should work across all databases Hibernate works on since shards is a layer on top of Hibernate core beneath the standard Hibernate interfaces. Programmers aren’t aware of it.

What they are doing is figuring out how to do standard things like save objects, update, and query objects across multiple databases using standard Hibernate interfaces. If Hibernate can talk to it they can talk to it.

A sharded session is used to contain Hibernate’s sessions so Hibernate capabilities are preserved.

Can not manage cross shard foreign relationships (yet). Do have runtime checks to detect when cross shard relations are used accidentally. No foreign key constraint checking and there’s no Hibernate lazy loading. From a programming perspective you can have IDs that reference other objects on other shards, it’s just that Hibernate won’t know about these relationships.

Now that the base software is done these more advanced features can be considered. It may take changes in Hibernate core


Pluggable Strategies Determine How Data Are Split Across Shards


A Strategy dictates how data are spread across the shards. It’s an interface you need to implement. There are three Strategies:
* Shard Resolution Strategy - how you will retrieve your objects.
* Shard Selection Strategy – define where objects are saved to.
* Access Strategy – once you figure out which shard you are talking to, how do you want to access those shards (serially, 2 at a time, in parallel, etc)?

Goal is to have Strategies as flexible as possible so you can decide how your data are sharded.

A couple of implementations are provided out of the box:
* Round Robin - First one goes to the first shard, second to the second shard, and then it loops back.
* Attribute Based – Look at attributes in the data to determine which shard. You can shard users by country, for example.


Some Limitations


Full Hibernate HQL is not yet supported (maybe it is now, but I couldn’t tell).

Distributed queries are handled by applying a standard HQL query to each shard, merging the results, and applying the filters. This all happens in the application server so using very large data sets could be a problem. It’s left to the intelligence of the developers to do the right thing to manage performance.

No mirroring or data replication.

No clean way to manage read only data you want on every shard for performance and referential integrity reasons. Say you have country data. It makes sense to replicate that data on each shard so all queries using that data can stay on the shard.

No handling of fail over situations, which is just like Hibernate. You could handle it in your connection pool or some other layer. It’s not considered part of the shard/OR mapping layer.

There’s a need for management tools that work across shards.

It’s possible to shard across different databases as long as you keep the same schema in the same in each database.


Related Articles


An Unorthodox Approach to Database Design: The Coming of the Shard.
分享到:
评论
1 楼 javaeyename 2008-03-05  
非常感谢你这个帖子提到的来源http://highscalability.com。这个网站好像专门针对大规模网站设计。正在学习中。

相关推荐

    hibernate + shardingjdbc +springboot 结合做的demo

    在这个示例中,我们将探讨如何将Hibernate ORM框架、ShardingSphere的ShardingJDBC模块与Spring Boot整合,构建一个高效的分布式数据库应用。这样的组合允许我们利用Hibernate的强大ORM能力,同时通过ShardingJDBC...

    sharding-proxy实现分表

    接下来我们将深入探讨标题和描述中涉及的“sharding-proxy实现分表”这一主题。 ### 1. 分库分表介绍 分库分表是数据库水平扩展的一种常见策略,用于解决单表数据量过大导致的性能问题。随着业务的增长,数据量...

    Sharding JDBC PPT 分享

    Sharding JDBC是一款开源的轻量级Java框架,它提供了一种分库分表的解决方案,用于解决大规模数据集下的数据库的性能问题。Sharding JDBC具有易于使用,无需额外依赖和强依赖数据库的特点。它允许开发人员对JDBC进行...

    MongoDB Sharding 机制分析

    MongoDB Sharding 机制分析 MongoDB Sharding 机制是 MongoDB 中的一种机制,用于将数据水平切分到不同的物理节点,以解决单机性能极限的问题。Sharding 可以利用上更多的硬件资源来解决单机性能极限的问题,并减小...

    当当开源sharding-jdbc-轻量级数据库分库分表中间件

    1. **高度兼容性**:Sharding-JDBC能够无缝集成到现有的基于Java的ORM框架中,如JPA、Hibernate、Mybatis、SpringJDBCTemplate等,或者直接使用JDBC进行操作,这极大地降低了代码迁移的成本。 2. **灵活性与扩展性...

    spring-sharding-mybatis

    【标题】"spring-sharding-mybatis" 涉及到的是Spring Boot集成ShardingSphere(原ShardingJDBC)和MyBatis实现数据库分片的解决方案。这是一个在微服务架构中处理大数据量、高并发场景时,提升系统性能的重要技术。...

    shardingJdbc功能代码

    ShardingJDBC是阿里巴巴开源的一款轻量级Java框架,它为数据库分片提供了一种解决方案。这个框架的主要目的是解决在大数据场景下,由于单表数据量过大导致的性能瓶颈问题。ShardingJDBC可以在不改变业务代码的情况下...

    ShardingJDBC5.1.1按月分库分表、读写分离、自动创表完整demo

    它不依赖任何特定的ORM框架,可以与JDBC、MyBatis、Hibernate等无缝集成,支持水平扩展,实现分库分表、读写分离等高级功能。 二、环境配置 在本示例中,我们采用SpringBoot作为基础开发框架,Mybatis-Plus作为...

    springboot整合sharding-jdbc完整代码

    SpringBoot整合Sharding-JDBC是将Sharding-JDBC这一分布式数据库中间件与SpringBoot框架结合,以实现数据分片、读写分离等高级数据库管理功能。这个完整的代码示例覆盖了Sharding-JDBC的主要技术点,使开发者可以...

    sharding-sphere4.1.1

    标题"sharding-sphere4.1.1"指的是ShardingSphere的4.1.1版本,这是一个流行的开源项目,专注于数据库分片、分布式事务和数据库治理。ShardingSphere由Apache Software Foundation(ASF)管理,提供了一个全面的...

    Oracle sharding database安装操作手册

    - Setting Up the Oracle Sharding Management and Routing Tier,设置Oracle分片管理和路由层。 - Deploying and Managing a System-Managed SDB,部署和管理一个系统管理的SDB(Shard Database)。 3. 详细安装...

    sharding-jdbc-demo

    【标题】"sharding-jdbc-demo" 是一个基于Sharding-JDBC、SpringBoot、MyBatis和Druid的示例项目,旨在展示如何在Java环境中整合这些组件来实现数据库分片和微服务架构。 【描述】这个项目的核心是利用Sharding-...

    sharding-jdbc按月分表样例

    【标题】"sharding-jdbc按月分表样例"是一个关于使用Sharding-JDBC进行数据库分片的示例项目,旨在展示如何根据月份动态地将数据分散到不同的表中,以实现数据的水平扩展和负载均衡。Sharding-JDBC是阿里巴巴开源的...

    sharding jdbc 基于java代码的配置.zip

    Sharding-JDBC是一款轻量级的Java框架,它旨在解决大数据量下的数据库分库分表问题,无需修改数据库和业务代码,只需要通过配置或者注解就能实现数据的分布式处理。本压缩包“sharding jdbc 基于java代码的配置.zip...

    sharding-jdbc.rar

    《基于Sharding-JDBC的SpringBoot+Mybatis整合实践》 在现代的互联网应用中,随着数据量的急剧增长,数据库的水平扩展成为了一个至关重要的问题。Sharding-JDBC作为一个轻量级的Java框架,提供了数据库分片的功能,...

    sharding-jdbc.zip

    【标题】"sharding-jdbc.zip" 是一个包含Sharding-JDBC相关示例的压缩文件,主要用于演示如何在Java项目中使用Sharding-JDBC进行数据库分片。 【描述】描述指出,这是一个基于Maven构建的Java项目实例,可以在本地...

    hibernate动态分表

    在Hibernate中,我们通常使用Sharding或Partitioning技术来实现这一目标。 1. Sharding策略:Sharding是将数据分布到多个独立的数据库实例上,每个实例包含一部分数据。在Hibernate中,可以通过自定义拦截器或者...

    Oracle_Sharding 演示视频

    Oracle_Sharding 演示视频 演示如何搭建和测试sharding.

    demo-example-sharding-jdbc-4.1.1.rar

    【标题】"demo-example-sharding-jdbc-4.1.1.rar" 是一个示例项目,其中包含了使用SpringBoot框架和ShardingJDBC 4.1.1版本进行数据库分表但不分库的实现。这个压缩包旨在展示如何在实际应用中有效地管理和扩展...

Global site tag (gtag.js) - Google Analytics