论坛首页 Java企业应用论坛

<第一周>key-value数据库集群的发现与思考(兼锻炼坚持能力)

浏览 13854 次
精华帖 (0) :: 良好帖 (2) :: 新手帖 (1) :: 隐藏帖 (0)
作者 正文
   发表时间:2010-01-21  
楼主可以尝试一下GigaSpaces.
0 请登录后投票
   发表时间:2010-01-21  
qaplwsok 写道
楼主可以尝试一下GigaSpaces.


不错的东西,好像我的目标有点扯远了,或者可能要实现我的目标本来就需要很大的系统……
0 请登录后投票
   发表时间:2010-01-21   最后修改:2010-01-21
自己实现或者改造现成的开源项目

不过既要实现高性能又要实现分布式事务这个还真是比较头疼
0 请登录后投票
   发表时间:2010-01-21  
豆瓣就有啊。。。。
豆瓣发布开源KeyValue存储系统BeansDB
http://code.google.com/p/beansdb/
0 请登录后投票
   发表时间:2010-01-21  
xianglei 写道
自己实现或者改造现成的开源项目

不过既要实现高性能又要实现分布式事务这个还真是比较头疼


我已经头疼一周了,估计还要头疼很久
0 请登录后投票
   发表时间:2010-01-21   最后修改:2010-01-21
我们最近做的新鲜事的项目使用的就是tc ,我们对事务基本没什么要求,对于事务这块可以自己去实现,通过记录日志的方式去搞定。 你可以看看新浪已经开源的项目(http://blog.developers.api.sina.com.cn/?paged=4)里面对亚马逊的Amazon Dynamo 设计实现。
0 请登录后投票
   发表时间:2010-01-21  
nkadun 写道
Stero 写道
Bigtable/HBase/Hypertable could meet your requirements, except they are column based with version control, heavier than simple key-value caused some overhead.

While Tokyo Tyrant/Tokyo Cabinet or MemcacheDB is more agile, but missing distribution functions.

So my idea is:
1. B tree instead of hashing should be used to keep keys ordered for sharding.
2. Key-value store node is be able to splite it self when data grows. Node should write logs in case if fails, data could be rebuild on other nodes.
3. Using the same master/slave architecture as Bigtable uses to manage key shards and shard server assignment. And also a meta0 and meta1 indexes for key shards. I think this task could be even abstracted as a standard open source project.

That's it.


I'm trying to understand your ideas though I can't quite follow you. By the way, why don't u reply in Chinese since you can read my topic which is written in Chinese and if you don't have EN input method only?


呵呵,这些天英文看多,也看到JavaEye上有很多英文回复,所以就习惯了,呵呵。

首先我对你说的“具备分布式事务功能以及不同隔离级别 ”不是很理解,感觉你说的不是很明确。我觉得事物/隔离这些东西和最优化性能还是有些设计上的冲突,应该让应用层来处理数据的隔离和同步问题。现有key-value实现里类似memcached的incr和cas操作能够帮助应用层解决很多事物性的问题。

所以,我就集中精力来讨论分布式高性能的key-value数据库:

1. 整个数据库应该是分片的(sharded),shards由集群里的众多shard server来提供服务。
2. 当一个shard里面的数据超过一定大小,为了保证高性能,应该进行拆分。
3. 每个shard server应该记录日志文件,当shard server失败后,可以根据日志文件由其他shard server重建shards里的内容。日志文件应该被分不到不同的机器上来避免单点失败。GFS和HDFS可以达到这个目的,但是响应有点长。可以考虑用rsync进行日志的同步备份。
4. 有一个主节点,来记录各个shard和shard server的对应关系,。客户端根据各个shard的起始结束key的范围(B+数存储方式)或者key hash值的范围(Hash存储方式)来向相应的shard server发出请求。主节点还要处理当一个shard server失败后shard的重新分配问题。

实际上这个就是Google Bigtable/Hadoop HBase的集群模型,除了数据简化为基于内存的key-value模型。
Bigtable和HBase的问题是:
1. 数据模型有列,每个cell还有版本控制,对于简单的高性能的key-value存储有些overhead。
2. 存储基于网络的GFS/HDFS,延迟较大。我们只需要基于硬盘的日志备份就可以了。

因此照搬Bigtable的集群模型,节点换为TT+TC和定制的日志文件备份机制。这就是我的想法。
参考文献:
http://labs.google.com/papers/bigtable-osdi06.pdf
0 请登录后投票
   发表时间:2010-01-21  
xianglei 写道
我们最近做的新鲜事的项目使用的就是tc ,我们对事务基本没什么要求,对于事务这块可以自己去实现,通过记录日志的方式去搞定。 你可以看看新浪已经开源的项目(http://blog.developers.api.sina.com.cn/?paged=4)里面对亚马逊的Amazon Dynamo 设计实现。

关注一下看看
0 请登录后投票
   发表时间:2010-01-21  
Stero 写道
nkadun 写道
Stero 写道
Bigtable/HBase/Hypertable could meet your requirements, except they are column based with version control, heavier than simple key-value caused some overhead.

While Tokyo Tyrant/Tokyo Cabinet or MemcacheDB is more agile, but missing distribution functions.

So my idea is:
1. B tree instead of hashing should be used to keep keys ordered for sharding.
2. Key-value store node is be able to splite it self when data grows. Node should write logs in case if fails, data could be rebuild on other nodes.
3. Using the same master/slave architecture as Bigtable uses to manage key shards and shard server assignment. And also a meta0 and meta1 indexes for key shards. I think this task could be even abstracted as a standard open source project.

That's it.


I'm trying to understand your ideas though I can't quite follow you. By the way, why don't u reply in Chinese since you can read my topic which is written in Chinese and if you don't have EN input method only?


呵呵,这些天英文看多,也看到JavaEye上有很多英文回复,所以就习惯了,呵呵。

首先我对你说的“具备分布式事务功能以及不同隔离级别 ”不是很理解,感觉你说的不是很明确。我觉得事物/隔离这些东西和最优化性能还是有些设计上的冲突,应该让应用层来处理数据的隔离和同步问题。现有key-value实现里类似memcached的incr和cas操作能够帮助应用层解决很多事物性的问题。

所以,我就集中精力来讨论分布式高性能的key-value数据库:

1. 整个数据库应该是分片的(sharded),shards由集群里的众多shard server来提供服务。
2. 当一个shard里面的数据超过一定大小,为了保证高性能,应该进行拆分。
3. 每个shard server应该记录日志文件,当shard server失败后,可以根据日志文件由其他shard server重建shards里的内容。日志文件应该被分不到不同的机器上来避免单点失败。GFS和HDFS可以达到这个目的,但是响应有点长。可以考虑用rsync进行日志的同步备份。
4. 有一个主节点,来记录各个shard和shard server的对应关系,。客户端根据各个shard的起始结束key的范围(B+数存储方式)或者key hash值的范围(Hash存储方式)来向相应的shard server发出请求。主节点还要处理当一个shard server失败后shard的重新分配问题。

实际上这个就是Google Bigtable/Hadoop HBase的集群模型,除了数据简化为基于内存的key-value模型。
Bigtable和HBase的问题是:
1. 数据模型有列,每个cell还有版本控制,对于简单的高性能的key-value存储有些overhead。
2. 存储基于网络的GFS/HDFS,延迟较大。我们只需要基于硬盘的日志备份就可以了。

因此照搬Bigtable的集群模型,节点换为TT+TC和定制的日志文件备份机制。这就是我的想法。
参考文献:
http://labs.google.com/papers/bigtable-osdi06.pdf

我继续学习中~~
0 请登录后投票
   发表时间:2010-01-21  
Ok, it's getting intensive now. Be warned, this is going to cause a lot of headaches, :-).

I can't type in chinese, even if I could, I can type only as slow as a turtle, so bear with me.

I had two posts, one is here: http://jellyfish.iteye.com/admin/blogs/100840
another is here:
http://www.cjsdn.net/post/view?bid=62&id=195175&sty=1&tpg=1&age=0

Updates:
1. tried Terracotta, if we go to distributed, then it's commericial. There was a war going on between Terracotta and Tangosol(looked like Terracotta started it). From the exchange fire, I sensed that Terracotta is far behind(~4 years behind).

2. I suddenly realized that the lock has to be distributed as well. Damn, more salts on the wound.

3. Here is the list of the rest features I think are useful(copied from the cjsdn link)

-----------------------------

If I understand the distributed cache correctly, the following are the key points on a distributed cache solution:
1. clusted: all different JVMs carry the same content, replicated.
2. fault tolerant: >1 copies in the cache. Normally this feature and #1 are implemented as one, i.e., users can specify how many copies they want in the entire cache.
3. scalable: meaning, increase machines/JVMs to expand memory, not copies. Whether this is transparent to the users, i.e., whether users need to change code for new cache servers.
4. network protocols: UDP/TCP
5. transactions
6. backup to files/databases asynchronously.
7. API for other languages, JDBC/ODBC drivers
8. SQL language manipulations.
9. near cache/far cache memory management.
10. locking
11. distributed event handling, such as JMS. This is essential for intersystem updates.

A big memeory cache is on the main course because db is limited and slow without customization. Hooking a lot of small machines into a gaint memory is the way to go. Google and Tangosol did it right. But the idea is way old. I used to play a game called MOM(master of orion, version 1, dated earlier 90's), it used the same idea, build thousands of small ships and stack them together to form a gaint monster. I talked with my fellow gamers about this idea, someone would do it right one day.


just my experience, hopefully this is useful for you(or scare you away from it, :-)).


0 请登录后投票
论坛首页 Java企业应用版

跳转论坛:
Global site tag (gtag.js) - Google Analytics