<第一周>key-value数据库集群的发现与思考（兼锻炼坚持能力）

全部 Hibernate Spring Struts iBATIS 企业应用 Lucene SOA Java综合 Tomcat 设计模式 OO JBoss

浏览 13873 次

锁定老帖子主题： <第一周>key-value数据库集群的发现与思考（兼锻炼坚持能力）精华帖 (0) :: 良好帖 (2) :: 新手帖 (1) :: 隐藏帖 (0)
作者	正文
nkadun 等级: 性别: 文章: 87 积分: 160 来自: 天津	发表时间：2010-01-20 最后修改：2010-01-20 相关推荐: locks_strikes：我被一家在线教育公司雇用，为门锁和罢工写了一份文件。远程办公第一天遇尴尬：企业通讯软件集体罢工、全天开视频读书笔记：《薛兆丰经济学讲义》解读下半年世界油轮运输物流管理市场运价行情在围绕公司种族主义进行罢工之后，Pinterest增加了其首位黑人董事会成员更多相关推荐互联网 1. 目标是找到一种如下的key-value数据库集群方案： - 具备高性能的读写，支持亿级PV - 具备灵活的可扩展性，同时也是支持上一条的基础 - 数据自动切分，即不依赖于业务逻辑 - 具备分布式事务功能以及不同隔离级别 - 具备数据的一致性或局部一致，最终全部一致 - 可独立提供网络层服务，以支持N（数据库集群）M（WEB集群）结构，并支持二进制的网络协议 - 方便的客户端API 2. 我最初的思路 - 使用Memcachedb，由客户端API（改造spymemcached）实现分布式事务和隔离 - 由客户端API使用Consistent Hashing进行数据切分 3. 关于自己思路的发现 - 读写性能不错 - 不具备良好的可扩展性，由于Consistent Hashing，由于负载过高需要增加一倍的节点时，数据转移量是巨大的（5星级问题） - 数据可以自动切分 - 分布式事务和隔离的实现只交由客户端API非常困难（5星级问题） - 一致性满足 - 有独立的网络层服务，可以实现NM架构，但不支持二进制协议（2星级问题） - 客户端API相对方便，如spymemcached 4. 继续发现由于自己是初学者，在这方面没有太多的经验，无法想出一个解决方案。于是我首先做的是去找其他已有的框架，并做粗浅的了解。根据"菜鸟"和Robbin以及Google的提示，做了下大概的了解，除了上述提到的Memcachedb外，还包括： - Cassandra，看看它能否解决我3中出现的三个问题，通过了解发现，它的扩展性是很好的，但没有分布式事务处理。 - Voldemort，面临跟Cassandra一样的问题，所以也让人纠结。 - Redis，主张以内存为主的存储，感觉用于生产会有问题。同时也不能解决分布式事务。 - JBossCache，提供了BDB Loader，但没有独立的网络层，需要与APP SERVER（如JBoss等）配合提供服务，有锁机制，由于配置复杂，也因为下面的Scalaris，我没有做进一步的了解。 - Scalaris，让我激动的框架，因为它支持分布式事务和隔离级别，并且以TC作为底层存储，目前正在研究它的扩展性和使用。由于不懂Erlang，这个过程比较费劲，但我仿佛看到一点光亮…… 由于是这方面的初学者，加上基础不足，有些见解可能是错误的或不足的，敬请各位大侠指正，谢谢！声明：ITeye文章版权属于作者，受法律保护。没有作者书面许可不得转载。推荐链接
返回顶楼

javaeyebird 等级: 性别: 文章: 142 积分: 100 来自: 鸟国	发表时间：2010-01-20 最后修改：2010-01-20 Voldemort如何？
返回顶楼	回帖地址 0 0 请登录后投票

jellyfish 等级: 初级会员文章: 49 积分: 72	发表时间：2010-01-20 about a year ago, I searched all open source cache implementations(your key value pair is basically a cache), didn't see anything close enough. Commerical software is tangosol, which was bought by oracle some time ago, can reach your range. I used it before and it's great, 80 millis to retrieve 250K complex objects in my test and I sense it's not linear when you go up. 80 millis is about the base. It's not hard to come up with a distributed schema for keys. The hard part is the performance.
返回顶楼	回帖地址 0 0 请登录后投票

nkadun 等级: 性别: 文章: 87 积分: 160 来自: 天津	发表时间：2010-01-20 javaeyebird 写道 Voldemort如何？除了没有分布式事务，其余的能很好的满足！
返回顶楼	回帖地址 0 0 请登录后投票

nkadun 等级: 性别: 文章: 87 积分: 160 来自: 天津	发表时间：2010-01-20 jellyfish 写道 about a year ago, I searched all open source cache implementations(your key value pair is basically a cache), didn't see anything close enough. Commerical software is tangosol, which was bought by oracle some time ago, can reach your range. I used it before and it's great, 80 millis to retrieve 250K complex objects in my test and I sense it's not linear when you go up. 80 millis is about the base. It's not hard to come up with a distributed schema for keys. The hard part is the performance. 嗯～～感谢回复，我刚看了一下Tangosol（现在叫Coherence），按它的说明看来非常适合我的需求和目标，由于暂时不能下载试用，不知道它的持久化存储是怎么实现的。另一个比较郁闷的是它不是开源的了……
返回顶楼	回帖地址 0 0 请登录后投票

nkadun 等级: 性别: 文章: 87 积分: 160 来自: 天津	发表时间：2010-01-20 nkadun 写道 jellyfish 写道 about a year ago, I searched all open source cache implementations(your key value pair is basically a cache), didn't see anything close enough. Commerical software is tangosol, which was bought by oracle some time ago, can reach your range. I used it before and it's great, 80 millis to retrieve 250K complex objects in my test and I sense it's not linear when you go up. 80 millis is about the base. It's not hard to come up with a distributed schema for keys. The hard part is the performance. 嗯～～感谢回复，我刚看了一下Tangosol（现在叫Coherence），按它的说明看来非常适合我的需求和目标，由于暂时不能下载试用，不知道它的持久化存储是怎么实现的。另一个比较郁闷的是它不是开源的了…… 哦，Coherence没有提供独立的网络层，有与WebLogic集成的例子。 BTW：我开始怀疑我的理解了，Cache还是Cache Server的问题……Cache Server意味着我可以NM，而Cache将是MM，有人告诉我吗？
返回顶楼	回帖地址 0 0 请登录后投票

lsc20051426 等级: 初级会员性别: 文章: 26 积分: 30 来自: 北京	发表时间：2010-01-21 推荐Hypertable+Hadoop:http://www.hypertable.org Hypertable是键值的，基于列存储的分布式数据库不知道能不能满足楼主的需求
返回顶楼	回帖地址 0 0 请登录后投票

jellyfish 等级: 初级会员文章: 49 积分: 72	发表时间：2010-01-21 nkadun 写道 nkadun 写道 jellyfish 写道 about a year ago, I searched all open source cache implementations(your key value pair is basically a cache), didn't see anything close enough. Commerical software is tangosol, which was bought by oracle some time ago, can reach your range. I used it before and it's great, 80 millis to retrieve 250K complex objects in my test and I sense it's not linear when you go up. 80 millis is about the base. It's not hard to come up with a distributed schema for keys. The hard part is the performance. 嗯～～感谢回复，我刚看了一下Tangosol（现在叫Coherence），按它的说明看来非常适合我的需求和目标，由于暂时不能下载试用，不知道它的持久化存储是怎么实现的。另一个比较郁闷的是它不是开源的了…… 哦，Coherence没有提供独立的网络层，有与WebLogic集成的例子。 BTW：我开始怀疑我的理解了，Cache还是Cache Server的问题……Cache Server意味着我可以NM，而Cache将是MM，有人告诉我吗？ Not sure what you mean by N*M. Tangosol is a cache server, distributed and scalable. Tangosol/Coherence is independent of Weblogic, though it can be used with weblogic. It has java/.net/c/c++ api interfaces. It has its own network lib(jar) to work with udp/tcp. Quite a few large financial institutions on wall streets are using it, with size of ~100G of data. I talked to those tangosol folks before and I think they are quite smart to get things right, such as scale up to 5000 nodes(machines). We need that size to convince us it's good. The downside is that it's commercial, and quite expensive. But from my experience, it's quite worth it because of the optimization. Another vendor is gemstone, I don't like them technically from the start(sales presentation to technical questionaire). So I chose tangosol for the company I worked for. Their key calculation(which key goes to which node), how to replicate data, and serialization are all trade secret, I have no clue. I basically use it as a blackbox. Sorry.
返回顶楼	回帖地址 0 0 请登录后投票

intih 等级: 初级会员性别: 文章: 16 积分: 10 来自: 北京	发表时间：2010-01-21 Tokyo Tyrant + Tokyo cabinet 应该可以满足你说的大部分要求，另外按我的理解对于键值数据库，你不应该在事物上对其有过多的要求，不知道对不对呵呵。楼上这位 jellyfish貌似是在US工作的呀。
返回顶楼	回帖地址 0 0 请登录后投票

Stero 等级: 初级会员性别: 文章: 4 积分: 30 来自: 北京	发表时间：2010-01-21 Bigtable/HBase/Hypertable could meet your requirements, except they are column based with version control, heavier than simple key-value caused some overhead. While Tokyo Tyrant/Tokyo Cabinet or MemcacheDB is more agile, but missing distribution functions. So my idea is: 1. B tree instead of hashing should be used to keep keys ordered for sharding. 2. Key-value store node is be able to splite it self when data grows. Node should write logs in case if fails, data could be rebuild on other nodes. 3. Using the same master/slave architecture as Bigtable uses to manage key shards and shard server assignment. And also a meta0 and meta1 indexes for key shards. I think this task could be even abstracted as a standard open source project. That's it.
返回顶楼	回帖地址 0 0 请登录后投票

论坛首页 → Java企业应用版

跳转论坛: