Perhaps you’re considering using a dedicated key-value or document
store instead of a traditional relational database. Reasons for this
might include:
Whatever your reasons, there are a lot of options to chose from. At
Last.fm we do a lot of batch computation in Hadoop, then dump it out to
other machines where it’s indexed and served up over HTTP and Thrift
as an internal service (stuff like ‘most popular songs in London, UK
this week’ etc). Presently we’re using a home-grown index format which
points into large files containing lots of data spanning many keys,
similar to the Haystack approach mentioned in this article about Facebook photo storage
.
It works, but rather than build our own replication and partitioning
system on top of this, we are looking to potentially replace it with a
distributed, resilient key-value store for reasons 4, 5 and 6 above.
This article represents my notes and research to date on distributed
key-value stores (and some other stuff) that might be suitable as RDBMS
replacements under the right conditions. I’m expecting to try some of
these out and investigate further in the coming months.
Here is a list of projects that could potentially replace a group of
relational database shards. Some of these are much more than key-value
stores, and aren’t suitable for low-latency data serving, but are
interesting none-the-less.
<style type="text/css">
#matrix td{ font-size:90%; vertical-align:top; padding: 3px; } #matrix tr { background: #f0f0f0; } #matrix tr.odd { background: #ddd; }
#matrix td.bigger {font-size:100%;}
</style>
Name |
Language |
Fault-tolerance |
Persistence |
Client Protocol |
Data model |
Docs |
Community |
Project Voldemort
|
Java |
partitioned, replicated, read-repair |
Pluggable: BerkleyDB, Mysql |
Java API |
Structured / blob / text |
A |
Linkedin, no |
Ringo
|
Erlang |
partitioned, replicated, immutable |
Custom on-disk (append only log) |
HTTP |
blob |
B |
Nokia, no |
Scalaris
|
Erlang |
partitioned, replicated, paxos |
In-memory only |
Erlang, Java, HTTP |
blob |
B |
OnScale, no |
Kai
|
Erlang |
partitioned, replicated? |
On-disk Dets file |
Memcached |
blob |
C |
no |
Dynomite
|
Erlang |
partitioned, replicated |
Pluggable: couch, dets |
Custom ascii, Thrift |
blob |
D+ |
Powerset, no |
MemcacheDB
|
C |
replication |
BerkleyDB |
Memcached |
blob |
B |
some |
ThruDB
|
C++ |
Replication |
Pluggable: BerkleyDB, Custom, Mysql, S3 |
Thrift |
Document oriented |
C+ |
Third rail, unsure |
CouchDB
|
Erlang |
Replication, partitioning? |
Custom on-disk |
HTTP, json |
Document oriented (json) |
A |
Apache, yes |
Cassandra
|
Java |
Replication, partitioning |
Custom on-disk |
Thrift |
Bigtable meets Dynamo |
F |
Facebook, no |
HBase
|
Java |
Replication, partitioning |
Custom on-disk |
Custom API, Thrift, Rest |
Bigtable |
A |
Apache, yes |
Hypertable
|
C++ |
Replication, partitioning |
Custom on-disk |
Thrift, other |
Bigtable |
A |
Zvents, Baidu, yes |
Why 5 of these aren’t suitable
What I’m really looking for is a low latency, replicated,
distributed key-value store. Something that scales well as you feed it
more machines, and doesn’t require much setup or maintenance - it
should just work. The API should be that of a simple hashtable:
set(key, val), get(key), delete(key). This would dispense with the
hassle of managing a sharded / replicated database setup, and hopefully
be capable of serving up data by primary key efficiently.
Five of the projects on the list are far from being simple key-value
stores, and as such don’t meet the requirements - but they are
definitely worth a mention.
1)
We’re already heavy users of Hadoop, and have been experimenting with Hbase
for a while. It’s much more than a KV store, but latency is too great
to serve data to the website. We will probably use Hbase internally for
other stuff though - we already have stacks of data in HDFS.
2)
Hypertable
provides a similar feature set
to Hbase (both are inspired by Google’s Bigtable). They recently
announced a new sponsor, Baidu - the biggest Chinese search engine.
Definitely one to watch, but doesn’t fit the low-latency KV store bill
either.
3)
Cassandra
sounded very promising when the
source was released by Facebook last year. They use it for inbox
search. It’s Bigtable-esque, but uses a DHT so doesn’t need a central
server (one of the Cassandra developers previously worked at Amazon on
Dynamo). Unfortunately it’s languished in relative obscurity since
release, because Facebook never really seemed interested in it as an
open-source project. From what I can tell there isn’t much in the way
of documentation or a community around the project at present.
4)
CouchDB
is an interesting one - it’s a
“distributed, fault-tolerant and schema-free document-oriented database
accessible via a RESTful HTTP/JSON API”. Data is stored in ‘documents’,
which are essentially key-value maps themselves, using the data types
you see in JSON. Read the CouchDB Technical Overview
if you are curious how the web’s trendiest document database works under the hood. This article on the Rules of Database App Aging
goes some way to explaining why document-oriented databases make sense.
CouchDB can do full text indexing of your documents, and lets you
express views over your data in Javascript. I could imagine using
CouchDB to store lots of data on users: name, age, sex, address, IM
name and lots of other fields, many of which could be null, and each
site update adds or changes the available fields. In situations like
that it quickly gets unwieldly adding and changing columns in a
database, and updating versions of your application code to match.
Although many people are using CouchDB in production, their FAQ points
out they may still make backwards-incompatible changes to the storage
format and API before version 1.0.
5)
ThruDB
is a document storage and indexing
system made up for four components: a document storage service,
indexing service, message queue and proxy. It uses Thrift for
communication, and has a pluggable storage subsystem, including an
Amazon S3 option. It’s designed to scale well horizontally, and might
be a better option that CouchDB if you are running on EC2. I’ve heard a
lot more about CouchDB than Thrudb recently, but it’s definitely worth
a look if you need a document database. It’s not suitable for our needs
for the same reasons as CouchDB.
Distributed key-value stores
The rest are much closer to being ’simple’ key-value stores with low
enough latency to be used for serving data used to build dynamic pages.
Latency will be dependent on the environment, and whether or not the
dataset fits in memory. If it does, I’d expect sub-10ms response time,
and if not, it all depends on how much money you spent on spinning rust.
MemcacheDB
is essentially just memcached that saves stuff to
disk using a Berkeley database. As useful as this may be for some
situations, it doesn’t deal with replication and partitioning
(sharding), so it would still require a lot of work to make it scale
horizontally and be tolerant of machine failure. Other memcached
derivatives such as repcached
go some way to addressing this by giving you the ability to replicate
entire memcache servers (async master-slave setup), but without
partitioning it’s still going to be a pain to manage.
Project Voldemort
looks awesome
. Go and read the rather splendid website
,
which explains how it works, and includes pretty diagrams and a good
description of how consistent hashing is used in the Design section.
(If consistent hashing butters your muffin, check out libketama - a consistent hashing library
and the Erlang libketama driver
).
Project-Voldemort handles replication and partitioning of data, and
appears to be well written and designed. It’s reassuring to read in the
docs how easy it is to swap out and mock different components for
testing. It’s non-trivial to add nodes to a running cluster, but
according to the mailing-list this is being worked on. It sounds like
this would fit the bill if we ran it with a Java load-balancer service
(see their Physical Architecture Options diagram) that exposed a Thrift
API so all our non-Java clients could use it.
Scalaris
is probably the most face-meltingly awesome thing
you could build in Erlang. CouchDB, Ejabberd and RabbitMQ are cool, but
Scalaris packs by far the most impressive collection of sexy
technologies. Scalaris is a key-value store - it uses a modified
version of the Chord algorithm to form a DHT, and stores the keys in
lexicographical order, so range queries are possible. Although I didn’t
see this explicitly mentioned, this should open up all sorts of
interesting options for batch processing - map-reduce for example. On
top of the DHT they use an improved version of Paxos
to guarantee ACID properties when dealing with multiple concurrent
transactions. So it’s a key-value store, but it can guarantee the ACID
properties and do proper distributed transactions over multiple keys.
Oh, and to demonstrate how you can scale a webservice based on such
a system, the Scalaris folk implemented their own version of Wikipedia
on Scalaris, loaded in the Wikipedia data, and benchmarked their setup
to prove it can do more transactions/sec on equal hardware than the
classic PHP/MySQL combo that Wikipedia use. Yikes.
From what I can tell, Scalaris is only memory-resident at the moment
and doesn’t persist data to disk. This makes it entirely impractical to
actually run a service like Wikipedia on Scalaris for real - but it
sounds like they tackled the hard problems first, and persisting to
disk should be a walk in the park after you rolled your own version of
Chord and made Paxos your bitch. Take a look at this presentation about
Scalaris from the Erlang Exchange conference: Scalaris presentation video
.
The reminaing projects, Dynomite
, Ringo
and Kai
are all, more or less, trying to be Dynamo. Of the three, Ringo
looks to be the most specialist - it makes a distinction between small
(less than 4KB) and medium-size data items (<100MB). Medium sized
items are stored in individual files, whereas small items are all
stored in an append-log, the index of which is read into memory at
startup. From what I can tell, Ringo can be used in conjunction with
the Erlang map-reduce framework Nokia are working on called Disco
.
I didn’t find out much about Kai
other than it’s rather new,
and some mentions in Japanese. You can chose either Erlang ets or dets
as the storage system (memory or disk, respectively), and it uses the
memcached protocol, so it will already have client libraries in many
languages.
Dynomite
doesn’t have great documentation, but it seems to be
more capable than Kai, and is under active development. It has
pluggable backends including the storage mechanism from CouchDB, so the
2GB file limit in dets won’t be an issue. Also I heard that Powerset
are using it, so that’s encouraging.
Summary
Scalaris is fascinating, and I hope I can find the time to
experiment more with it, but it needs to save stuff to disk before it’d
be useful for the kind of things we might use it for at Last.fm.
I’m keeping an eye on Dynomite - hopefully more information will
surface about what Powerset are doing with it, and how it performs at a
large scale.
Based on my research so far, Project-Voldemort looks like the most
suitable for our needs. I’d love to hear more about how it’s used at
LinkedIn, and how many nodes they are running it on.
What else is there?
Here are some other related projects:
If you know of anything I’ve missed off the list, or have any
feedback/suggestions, please post a comment. I’m especially interested
in hearing about people who’ve tested or are using KV-stores in lieu of
relational databases.
UPDATE 1:
Corrected table: memcachedb does replication, as per BerkeleyDB.
相关推荐
赠送jar包:datanucleus-rdbms-3.2.9.jar; 赠送原API文档:datanucleus-rdbms-3.2.9-javadoc.jar; 赠送源代码:datanucleus-rdbms-3.2.9-sources.jar; 赠送Maven依赖信息文件:datanucleus-rdbms-3.2.9.pom; ...
赠送jar包:datanucleus-rdbms-4.1.7.jar; 赠送原API文档:datanucleus-rdbms-4.1.7-javadoc.jar; 赠送源代码:datanucleus-rdbms-4.1.7-sources.jar; 赠送Maven依赖信息文件:datanucleus-rdbms-4.1.7.pom; ...
解决datax mysql8兼容性问题
为了简化这一过程,Oracle Linux 6 64位系统提供了一个名为"oracle-rdbms-server-11gR2-preinstall"的工具包。这个工具包自动化地处理了一系列预安装任务,极大地减轻了DBA的工作负担,提升了安装效率。以下是这个...
oracle-rdbms-server-11gR2-preinstall-1.0-4.el7.x86_64.rpm
资源来自pypi官网。 资源全名:azure-mgmt-rdbms-1.6.0.zip
oracle12c 官方预安装检测包 使用说明https://blog.csdn.net/ljunjie82/article/details/42201473
4. **支持多种数据库**:由于是Java编写,AggDesigner可以与多种关系型数据库管理系统(RDBMS)配合使用,如MySQL, Oracle, PostgreSQL等。 5. **可扩展性**:AggDesigner允许开发人员通过自定义插件扩展其功能,以...
**PyPI 官网下载 | azure-mgmt-rdbms-8.1.0b2.zip** 这个压缩包文件“azure-mgmt-rdbms-8.1.0b2.zip”来源于Python的官方软件仓库PyPI(Python Package Index),它是Python开发者获取和分享开源软件的重要平台。这...
标题中的“PyPI 官网下载 | opal-azure-cli-rdbms-0.3.13.tar.gz”指的是在Python Package Index (PyPI)官方平台上可以下载到的一个软件包,名为`opal-azure-cli-rdbms-0.3.13`。这个包是用`.tar.gz`格式压缩的,这...
开源项目-achiku-testable-go-rdbms.zip,GitHub - achiku/testable-go-rdbms: Sample setup for testable Go RDBMS backed application
官方 DataX 更新到 datax_v202309 版本还不能支持Oracle写入的update,本程序在官方原版的基础上升级,完美适配 oralce 更新数据,奉献给需要Oracle update 数据的盆...替换文件: plugin-rdbms-util-0.0.1-SNAPSHOT.jar
注:下文中的 *** 代表文件名中的组件名称。 # 包含: 中文-英文对照文档:【***-javadoc-API文档-中文(简体)-英语-对照版.zip】 jar包下载地址:【***.jar下载地址(官方地址+国内镜像地址).txt】 ...
注:下文中的 *** 代表文件名中的组件名称。 # 包含: 中文-英文对照文档:【***-javadoc-API文档-中文(简体)-英语-对照版.zip】 jar包下载地址:【***.jar下载地址(官方地址+国内镜像地址).txt】 ...
本文将深入探讨两种截然不同的数据库类型:HBase和RDBMS(关系型数据库管理系统)之间的差异,以及为什么在某些情况下选择HBase可能更为合适。 HBase,全称为Hadoop Database,是一种分布式、列式存储的NoSQL数据库...
《关系型数据库开发者图数据库终极指南》是一本深入探讨如何将传统的关系型数据库(RDBMS)系统与图数据库(特别是Neo4j)相结合的专业书籍。对于那些熟悉RDBMS但对图数据库尚不熟悉的开发者来说,这本书提供了一个...
本文为大家介绍下如何解决oracle12c安装报错:PRVF-0002,具体的排查思路如下,有类似情况的朋友可以参考下
2. `plugin-rdbms-util`:这个模块很可能是针对关系型数据库的插件工具包,提供了与RDBMS(如PostgreSQL)交互的特定功能。在本次更新中,它应该已经扩展了对PG数组类型读取和写入的支持,可能包括了转换、序列化和...
"Siddhi Store RDBMS" 是一个基于WSO2的Siddhi扩展,用于在关系型数据库(RDBMS)中存储和管理流处理数据。"wso2_siddhi_源码" 指的是这个扩展的源代码,意味着我们将深入到代码层面去理解其工作原理和实现细节。 *...
Key Skills & Concepts--Lists of specific skills covered in the chapter Ask the Experts--Q&A sections filled with bonus information and helpful tips Try This--Hands-on exercises that show how to apply ...