- 浏览: 616661 次
- 性别:
- 来自: 杭州
文章分类
- 全部博客 (228)
- io (15)
- cluster (16)
- linux (7)
- js (23)
- bizarrerie (46)
- groovy (1)
- thread (1)
- jsp (8)
- static (4)
- cache (3)
- protocol (2)
- ruby (11)
- hibernate (6)
- svn (1)
- python (8)
- spring (19)
- gma (1)
- architecture (4)
- search (15)
- db (3)
- ibatis (1)
- html5 (1)
- iptables (1)
- server (5)
- nginx (4)
- scala (1)
- DNS (1)
- jPlayer (1)
- Subversion 版本控制 (1)
- velocity (1)
- html (1)
- ppt poi (1)
- java (1)
- bizarrerie spring security (1)
最新评论
-
koreajapan03:
楼主啊,好人啊,帮我解决了问题,谢谢
自定义过滤器时,不能再使用<sec:authorize url="">问题 -
snailprince:
请问有同一页面,多个上传实例的例子吗
webuploader用java实现上传 -
wutao8818:
姚小呵 写道如何接收server返回的参数呢?例如你返回的是“ ...
webuploader用java实现上传 -
姚小呵:
如何接收server返回的参数呢?例如你返回的是“1”,上传的 ...
webuploader用java实现上传 -
zycjf2009:
你好,我想用jplayer做一个简单的播放器,但是因为对js不 ...
jplayer 实战
http://highscalability.com/sharding-hibernate-way
To scale you are supposed to partition your data. Sounds good, but how do you do it? When you actually sit down to work out all the details it’s not that easy. Hibernate Shards to the rescue! Hibernate shards is: an extension to the core Hibernate product that adds facilities for horizontal partitioning. If you know the core Hibernate API you know the shards API. No learning curve at all. Here is what a few members of the core group had to say about the Hibernate Shards open source project. Although there are some limitations, from the sound of it they are doing useful stuff in the right way and it’s very much worth looking at, especially if you use Hibernate or some other ORM layer.
Information Sources
Google Developer Podcast Episode Six: The Hibernate Shards Open Source Project. This is the document summarized here.
Hibernate Shards Project Page
Hibernate Shards Dev Discussion Group.
Ryan Barrett’s Scaling on the Cheap presentation. Many of the lessons from here are in Hibernate Shards.
What is Hibernate Shards?
Shard: splitting up your data sets. If your data doesn't fit on one machine you split it up into pieces and each piece is called a shard.
Sharding: the process of splitting up data.
Sharding is used when you have too much data to fit in one single relational database. If your database has a JDBC adapter that means Hibernate can talk to it and if Hibernate can talk to it that means Hibernate Shards can talk to it.
Hibernate was chosen because it's a good ORM tool used internally at Google, but to Google Scale (really really big), sharding needed to be added because Hibernate didn’t support that sort of scale out of the box.
The learning curve for a Hibernate user is zero because the Hibernate API is the same. The shard implementation hasn’t violated the API (yet).
How does it compare to MySQL's horizontal partitioning? Shards is for situations where you have too much data to fit in a single database. MySQL partitioning may allow you to delay when you need to shard, but it is still a single database and you’ll eventually run into limits.
Schema Design for Shards
When sharding you have to consider the general issues of distributed data design for high data volumes. These aren’t Hibernate Shards specific issues, but are general to the problem space.
Schema design is the most important of the sharding process and you’ll have to do that up front.
You need to pick a dimension, a root level entity, that is easily sharded. Users and customers are common examples.
Accept the fact that those entities and all the entities that hang off those entities will be stored in separate physical spaces. Querying across different shards will be difficult. As will management and just about anything else you take for granted.
Control over how data are distributed is determined by a pluggable strategies layer.
Plan for the future by picking a strategy that will last you a long time. Repartitioning/resharding the data is operationally very difficult. No management tools for this yet.
Build simpler models that don't contain as many relationships because you don't have cross shard relationships. Your objects graphs should be contained on one shard as much as possible.
Lots of lots of objects pointing to each other may not be a good candidate for sharding.
Because the shards design doesn’t modify Hibernate core, you can design using shards from the start, even though you only have one database. Then when you need to start scaling it will be easier to grow.
Existing systems with shardable tables shouldn’t take very long to get up and running.
The Sharding Code’s Relationship to Hibernate
Shards doesn't have full support for Hibernate’s query interface. Hibernate has a criteria or a query interface. Criteria interface is robust, but not good for JPA (Java persistence API), which is query based.
Sharding should work across all databases Hibernate works on since shards is a layer on top of Hibernate core beneath the standard Hibernate interfaces. Programmers aren’t aware of it.
What they are doing is figuring out how to do standard things like save objects, update, and query objects across multiple databases using standard Hibernate interfaces. If Hibernate can talk to it they can talk to it.
A sharded session is used to contain Hibernate’s sessions so Hibernate capabilities are preserved.
Can not manage cross shard foreign relationships (yet). Do have runtime checks to detect when cross shard relations are used accidentally. No foreign key constraint checking and there’s no Hibernate lazy loading. From a programming perspective you can have IDs that reference other objects on other shards, it’s just that Hibernate won’t know about these relationships.
Now that the base software is done these more advanced features can be considered. It may take changes in Hibernate core
Pluggable Strategies Determine How Data Are Split Across Shards
A Strategy dictates how data are spread across the shards. It’s an interface you need to implement. There are three Strategies:
* Shard Resolution Strategy - how you will retrieve your objects.
* Shard Selection Strategy – define where objects are saved to.
* Access Strategy – once you figure out which shard you are talking to, how do you want to access those shards (serially, 2 at a time, in parallel, etc)?
Goal is to have Strategies as flexible as possible so you can decide how your data are sharded.
A couple of implementations are provided out of the box:
* Round Robin - First one goes to the first shard, second to the second shard, and then it loops back.
* Attribute Based – Look at attributes in the data to determine which shard. You can shard users by country, for example.
Some Limitations
Full Hibernate HQL is not yet supported (maybe it is now, but I couldn’t tell).
Distributed queries are handled by applying a standard HQL query to each shard, merging the results, and applying the filters. This all happens in the application server so using very large data sets could be a problem. It’s left to the intelligence of the developers to do the right thing to manage performance.
No mirroring or data replication.
No clean way to manage read only data you want on every shard for performance and referential integrity reasons. Say you have country data. It makes sense to replicate that data on each shard so all queries using that data can stay on the shard.
No handling of fail over situations, which is just like Hibernate. You could handle it in your connection pool or some other layer. It’s not considered part of the shard/OR mapping layer.
There’s a need for management tools that work across shards.
It’s possible to shard across different databases as long as you keep the same schema in the same in each database.
Related Articles
An Unorthodox Approach to Database Design: The Coming of the Shard.
To scale you are supposed to partition your data. Sounds good, but how do you do it? When you actually sit down to work out all the details it’s not that easy. Hibernate Shards to the rescue! Hibernate shards is: an extension to the core Hibernate product that adds facilities for horizontal partitioning. If you know the core Hibernate API you know the shards API. No learning curve at all. Here is what a few members of the core group had to say about the Hibernate Shards open source project. Although there are some limitations, from the sound of it they are doing useful stuff in the right way and it’s very much worth looking at, especially if you use Hibernate or some other ORM layer.
Information Sources
Google Developer Podcast Episode Six: The Hibernate Shards Open Source Project. This is the document summarized here.
Hibernate Shards Project Page
Hibernate Shards Dev Discussion Group.
Ryan Barrett’s Scaling on the Cheap presentation. Many of the lessons from here are in Hibernate Shards.
What is Hibernate Shards?
Shard: splitting up your data sets. If your data doesn't fit on one machine you split it up into pieces and each piece is called a shard.
Sharding: the process of splitting up data.
Sharding is used when you have too much data to fit in one single relational database. If your database has a JDBC adapter that means Hibernate can talk to it and if Hibernate can talk to it that means Hibernate Shards can talk to it.
Hibernate was chosen because it's a good ORM tool used internally at Google, but to Google Scale (really really big), sharding needed to be added because Hibernate didn’t support that sort of scale out of the box.
The learning curve for a Hibernate user is zero because the Hibernate API is the same. The shard implementation hasn’t violated the API (yet).
How does it compare to MySQL's horizontal partitioning? Shards is for situations where you have too much data to fit in a single database. MySQL partitioning may allow you to delay when you need to shard, but it is still a single database and you’ll eventually run into limits.
Schema Design for Shards
When sharding you have to consider the general issues of distributed data design for high data volumes. These aren’t Hibernate Shards specific issues, but are general to the problem space.
Schema design is the most important of the sharding process and you’ll have to do that up front.
You need to pick a dimension, a root level entity, that is easily sharded. Users and customers are common examples.
Accept the fact that those entities and all the entities that hang off those entities will be stored in separate physical spaces. Querying across different shards will be difficult. As will management and just about anything else you take for granted.
Control over how data are distributed is determined by a pluggable strategies layer.
Plan for the future by picking a strategy that will last you a long time. Repartitioning/resharding the data is operationally very difficult. No management tools for this yet.
Build simpler models that don't contain as many relationships because you don't have cross shard relationships. Your objects graphs should be contained on one shard as much as possible.
Lots of lots of objects pointing to each other may not be a good candidate for sharding.
Because the shards design doesn’t modify Hibernate core, you can design using shards from the start, even though you only have one database. Then when you need to start scaling it will be easier to grow.
Existing systems with shardable tables shouldn’t take very long to get up and running.
The Sharding Code’s Relationship to Hibernate
Shards doesn't have full support for Hibernate’s query interface. Hibernate has a criteria or a query interface. Criteria interface is robust, but not good for JPA (Java persistence API), which is query based.
Sharding should work across all databases Hibernate works on since shards is a layer on top of Hibernate core beneath the standard Hibernate interfaces. Programmers aren’t aware of it.
What they are doing is figuring out how to do standard things like save objects, update, and query objects across multiple databases using standard Hibernate interfaces. If Hibernate can talk to it they can talk to it.
A sharded session is used to contain Hibernate’s sessions so Hibernate capabilities are preserved.
Can not manage cross shard foreign relationships (yet). Do have runtime checks to detect when cross shard relations are used accidentally. No foreign key constraint checking and there’s no Hibernate lazy loading. From a programming perspective you can have IDs that reference other objects on other shards, it’s just that Hibernate won’t know about these relationships.
Now that the base software is done these more advanced features can be considered. It may take changes in Hibernate core
Pluggable Strategies Determine How Data Are Split Across Shards
A Strategy dictates how data are spread across the shards. It’s an interface you need to implement. There are three Strategies:
* Shard Resolution Strategy - how you will retrieve your objects.
* Shard Selection Strategy – define where objects are saved to.
* Access Strategy – once you figure out which shard you are talking to, how do you want to access those shards (serially, 2 at a time, in parallel, etc)?
Goal is to have Strategies as flexible as possible so you can decide how your data are sharded.
A couple of implementations are provided out of the box:
* Round Robin - First one goes to the first shard, second to the second shard, and then it loops back.
* Attribute Based – Look at attributes in the data to determine which shard. You can shard users by country, for example.
Some Limitations
Full Hibernate HQL is not yet supported (maybe it is now, but I couldn’t tell).
Distributed queries are handled by applying a standard HQL query to each shard, merging the results, and applying the filters. This all happens in the application server so using very large data sets could be a problem. It’s left to the intelligence of the developers to do the right thing to manage performance.
No mirroring or data replication.
No clean way to manage read only data you want on every shard for performance and referential integrity reasons. Say you have country data. It makes sense to replicate that data on each shard so all queries using that data can stay on the shard.
No handling of fail over situations, which is just like Hibernate. You could handle it in your connection pool or some other layer. It’s not considered part of the shard/OR mapping layer.
There’s a need for management tools that work across shards.
It’s possible to shard across different databases as long as you keep the same schema in the same in each database.
Related Articles
An Unorthodox Approach to Database Design: The Coming of the Shard.
发表评论
-
Membase分布式KeyValue数据库
2011-01-02 16:08 1599Membase is a distributed key-va ... -
可靠、高性能的 TCP/HTTP 负载均衡器
2009-08-12 10:09 1541HAProxy 可靠、高性能的 TCP/HTTP 负载均衡器 ... -
Welcome to Solr
2009-03-07 19:46 1165Welcome to Solr http://lucene.a ... -
Hibernate Shards 概略
2009-03-05 10:12 2161来自 hibernate_shards中文参考指南 分片策略 ... -
守护程序死亡时重新启动守护程序的方法
2008-05-12 16:52 1355可以令操作系统在一个守护程序死亡时自动重启它。 方法是将此可执 ... -
build a highly available cluster [1]
2008-05-12 15:21 1244最近在读Karl Kopper 用商业硬件和免费软件构建高可用 ... -
负载均衡中ehcache的配置
2007-12-15 23:58 1921http://forum.springside.org.cn/ ... -
Google Code for Educators
2007-12-14 23:11 1251Google: Cluster Computing and M ... -
Tailrank Architecture - Learn How to Track Memes Across the
2007-12-11 16:24 1442转自:http://www.highscalability.c ... -
How To Setup MogileFS
2007-12-09 19:31 145Getting MogileFS $ mkdir mogil ... -
HA-JDBC: High-Availability JDBC
2007-12-09 03:27 4988数据库集群好伙伴 Overview HA-JDBC is a ... -
Hibernate Search 3.0.0.GA offers two back ends
2007-12-09 02:30 21362.2.1. Lucene In this mode, all ... -
Hibernate Shards 3.0.0.Beta2存在的限制
2007-12-09 02:22 2573来源 Hibernate Shards docs 6.1. ... -
Using Master/Slave Replication with ReplicationConnection
2007-12-04 12:03 1928Starting with Connector/J 3.1.7 ... -
Horizontal Database Partitioning with Spring and Hibernate
2007-12-04 12:01 3260Horizontal Database Partitionin ... -
无共享架构(Share Nothing Architecture)
2007-06-22 09:35 8753关于集群的补课 (转) http://www.blogjav ...
相关推荐
在这个示例中,我们将探讨如何将Hibernate ORM框架、ShardingSphere的ShardingJDBC模块与Spring Boot整合,构建一个高效的分布式数据库应用。这样的组合允许我们利用Hibernate的强大ORM能力,同时通过ShardingJDBC...
接下来我们将深入探讨标题和描述中涉及的“sharding-proxy实现分表”这一主题。 ### 1. 分库分表介绍 分库分表是数据库水平扩展的一种常见策略,用于解决单表数据量过大导致的性能问题。随着业务的增长,数据量...
Sharding JDBC是一款开源的轻量级Java框架,它提供了一种分库分表的解决方案,用于解决大规模数据集下的数据库的性能问题。Sharding JDBC具有易于使用,无需额外依赖和强依赖数据库的特点。它允许开发人员对JDBC进行...
MongoDB Sharding 机制分析 MongoDB Sharding 机制是 MongoDB 中的一种机制,用于将数据水平切分到不同的物理节点,以解决单机性能极限的问题。Sharding 可以利用上更多的硬件资源来解决单机性能极限的问题,并减小...
1. **高度兼容性**:Sharding-JDBC能够无缝集成到现有的基于Java的ORM框架中,如JPA、Hibernate、Mybatis、SpringJDBCTemplate等,或者直接使用JDBC进行操作,这极大地降低了代码迁移的成本。 2. **灵活性与扩展性...
【标题】"spring-sharding-mybatis" 涉及到的是Spring Boot集成ShardingSphere(原ShardingJDBC)和MyBatis实现数据库分片的解决方案。这是一个在微服务架构中处理大数据量、高并发场景时,提升系统性能的重要技术。...
ShardingJDBC是阿里巴巴开源的一款轻量级Java框架,它为数据库分片提供了一种解决方案。这个框架的主要目的是解决在大数据场景下,由于单表数据量过大导致的性能瓶颈问题。ShardingJDBC可以在不改变业务代码的情况下...
它不依赖任何特定的ORM框架,可以与JDBC、MyBatis、Hibernate等无缝集成,支持水平扩展,实现分库分表、读写分离等高级功能。 二、环境配置 在本示例中,我们采用SpringBoot作为基础开发框架,Mybatis-Plus作为...
SpringBoot整合Sharding-JDBC是将Sharding-JDBC这一分布式数据库中间件与SpringBoot框架结合,以实现数据分片、读写分离等高级数据库管理功能。这个完整的代码示例覆盖了Sharding-JDBC的主要技术点,使开发者可以...
标题"sharding-sphere4.1.1"指的是ShardingSphere的4.1.1版本,这是一个流行的开源项目,专注于数据库分片、分布式事务和数据库治理。ShardingSphere由Apache Software Foundation(ASF)管理,提供了一个全面的...
- Setting Up the Oracle Sharding Management and Routing Tier,设置Oracle分片管理和路由层。 - Deploying and Managing a System-Managed SDB,部署和管理一个系统管理的SDB(Shard Database)。 3. 详细安装...
【标题】"sharding-jdbc-demo" 是一个基于Sharding-JDBC、SpringBoot、MyBatis和Druid的示例项目,旨在展示如何在Java环境中整合这些组件来实现数据库分片和微服务架构。 【描述】这个项目的核心是利用Sharding-...
【标题】"sharding-jdbc按月分表样例"是一个关于使用Sharding-JDBC进行数据库分片的示例项目,旨在展示如何根据月份动态地将数据分散到不同的表中,以实现数据的水平扩展和负载均衡。Sharding-JDBC是阿里巴巴开源的...
Sharding-JDBC是一款轻量级的Java框架,它旨在解决大数据量下的数据库分库分表问题,无需修改数据库和业务代码,只需要通过配置或者注解就能实现数据的分布式处理。本压缩包“sharding jdbc 基于java代码的配置.zip...
《基于Sharding-JDBC的SpringBoot+Mybatis整合实践》 在现代的互联网应用中,随着数据量的急剧增长,数据库的水平扩展成为了一个至关重要的问题。Sharding-JDBC作为一个轻量级的Java框架,提供了数据库分片的功能,...
【标题】"sharding-jdbc.zip" 是一个包含Sharding-JDBC相关示例的压缩文件,主要用于演示如何在Java项目中使用Sharding-JDBC进行数据库分片。 【描述】描述指出,这是一个基于Maven构建的Java项目实例,可以在本地...
在Hibernate中,我们通常使用Sharding或Partitioning技术来实现这一目标。 1. Sharding策略:Sharding是将数据分布到多个独立的数据库实例上,每个实例包含一部分数据。在Hibernate中,可以通过自定义拦截器或者...
Oracle_Sharding 演示视频 演示如何搭建和测试sharding.
【标题】"demo-example-sharding-jdbc-4.1.1.rar" 是一个示例项目,其中包含了使用SpringBoot框架和ShardingJDBC 4.1.1版本进行数据库分表但不分库的实现。这个压缩包旨在展示如何在实际应用中有效地管理和扩展...