While SQL databases are insanely useful tools, their tyranny of ~15 years is coming to an end.
And it was just time: I can't even count the things that were forced into relational databases,
but never really fitted them.
But the differences between "NoSQL" databases are much bigger than it ever was between one
SQL database and another. This means that it is a bigger responsibility on
software architects
to choose the appropriate one for a project right at the beginning.
In this light, here is a comparison of
Cassandra
,
Mongodb
,
CouchDB
,
Redis
,
Riak
and
HBase
:
CouchDB
-
Written in:
Erlang
-
Main point:
DB consistency, ease of use
-
License:
Apache
-
Protocol:
HTTP/REST
- Bi-directional (!) replication,
- continuous or ad-hoc,
- with conflict detection,
- thus, master-master replication. (!)
- MVCC - write operations do not block reads
- Previous versions of documents are available
- Crash-only (reliable) design
- Needs compacting from time to time
- Views: embedded map/reduce
- Formatting views: lists & shows
- Server-side document validation possible
- Authentication possible
- Real-time updates via _changes (!)
- Attachment handling
- thus, CouchApps
(standalone js apps)
- jQuery library included
Best used:
For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.
For example:
CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.
Redis
-
Written in:
C/C++
-
Main point:
Blazing fast
-
License:
BSD
-
Protocol:
Telnet-like
- Disk-backed in-memory database,
- but since 2.0, it can swap to disk.
- Master-slave replication
- Simple keys and values,
- but complex operations
like ZREVRANGEBYSCORE
- INCR & co (good for rate limiting or statistics)
- Has sets (also union/diff/inter)
- Has lists (also a queue; blocking pop)
- Has hashes (objects of multiple fields)
- Of all these databases, only Redis does transactions (!)
- Values can be set to expire (as in a cache)
- Sorted sets (high score table, good for range queries)
- Pub/Sub and WATCH on data changes (!)
Best used:
For rapidly changing data with a foreseeable database size (should fit mostly in memory).
For example:
Stock prices. Analytics. Real-time data collection. Real-time communication.
MongoDB
-
Written in:
C++
-
Main point:
Retains some friendly properties of SQL. (Query, index)
-
License:
AGPL (Drivers: Apache)
-
Protocol:
Custom, binary (BSON)
- Master/slave replication
- Queries are javascript expressions
- Run arbitrary javascript functions server-side
- Better update-in-place than CouchDB
- Sharding built-in
- Uses memory mapped files for data storage
- Performance over features
- After crash, it needs to repair tables
- Better durablity coming in V1.8
Best used:
If you need dynamic queries. If you prefer to define indexes, not
map/reduce functions. If you need good performance on a big DB. If you
wanted CouchDB, but your data changes too much, filling up disks.
For example:
For all things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.
Cassandra
-
Written in:
Java
-
Main point:
Best of BigTable and Dynamo
-
License:
Apache
-
Protocol:
Custom, binary (Thrift)
- Tunable trade-offs for distribution and replication (N, R, W)
- Querying by column, range of keys
- BigTable-like features: columns, column families
- Writes are much faster than reads (!)
- Map/reduce possible with Apache Hadoop
- I admit being a bit biased against it, because of the bloat
and complexity it has partly because of Java (configuration, seeing
exceptions, etc)
Best used:
When you write more than you read (logging). If every component of the
system must be in Java. ("No one gets fired for choosing Apache's
stuff.")
For example:
Banking, financial industry (though not necessarily for financial
transactions, but these industries are much bigger than that.) Writes
are faster than reads, so one natural niche is real time data analysis.
Riak
-
Written in:
Erlang & C, some Javascript
-
Main point:
Fault tolerance
-
License:
Apache
-
Protocol:
HTTP/REST
- Tunable trade-offs for distribution and replication (N, R, W)
- Pre- and post-commit hooks,
- for validation and security.
- Built-in full-text search
- Map/reduce in javascript or Erlang
- Comes in "open source" and "enterprise" editions
Best used:
If you want something Cassandra-like (Dynamo-like), but no way you're
gonna deal with the bloat and complexity. If you need very good
single-site scalability, availability and fault-tolerance, but you're
ready to pay for multi-site replication.
For example:
Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt.
HBase
(With the help of ghshephard)
-
Written in:
Java
-
Main point:
Billions of rows X millions of columns
-
License:
Apache
-
Protocol:
HTTP/REST (also Thrift)
- Modeled after BigTable
- Map/reduce with Hadoop
- Query predicate push down via server side scan and get filters
- Optimizations for real time queries
- A high performance Thrift gateway
- HTTP supports XML, Protobuf, and binary
- Cascading, hive, and pig source and sink modules
- Jruby-based (JIRB) shell
- No single point of failure
- Rolling restart for configuration changes and minor upgrades
- Random access performance is like MySQL
Best used:
If you're in love with BigTable. :) And when you need random, realtime read/write access to your Big Data.
For example:
Facebook Messaging Database (more general example coming soon)
Of course, all systems have much more features than what's listed
here. I only wanted to list the key points that I base my decisions on.
Also, development of all are very fast, so things are bound to change.
I'll do my best to keep this list updated.
原文地址:http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
分享到:
相关推荐
SequoiaDB、Cassandra和MongoDB都是流行的NoSQL数据库解决方案,它们各自在性能、架构和适用场景上有着不同的特点。SequoiaDB是一个分布式文档数据库,拥有水平可扩展、高可用性、强一致性和高性能等特性。Cassandra...
非关系数据库(经常被称为NoSQL)的特点是弹性和可伸缩性。另外,它们可以存储大数据并与云计算系统协同工作。这些因素导致非关系数据库非常流行。在2013年,NoSQL数据库的种类达到了150多个,并且一直在增长,多种...
Cassandra与HBase系统架构比对 Cassandra与HBase是两种常用的NoSQL数据库管理系统,它们之间有着许多相似之处,但同时也存在着一些关键的差异。以下是对Cassandra与HBase系统架构的比对。 数据模型 Cassandra的...
NoSQL 数据库有多种类型,例如:MongoDB, Cassandra, CouchDB, Hypertable, Redis, Riak, Neo4j, HBASE, Couchbase, MemcacheDB, RevenDB, Voldemort 等。 MongoDB 与 RDBMS 的差别 MongoDB 和 RDBMS 都是免费开源...
### Cassandra与HBase系统架构比对 #### 一、引言 随着大数据技术的发展,分布式数据库因其出色的扩展性和高可用性而受到了广泛的关注。在众多的NoSQL数据库中,Cassandra 和 HBase 是两种非常流行的分布式数据库...
NoSQL数据库(如MongoDB,Cassandra):MongoDB的复制与容错.docx
近期, 知名独立基准测评机构bankmark,针对SequoiaDB、MongoDB以及Cassandra三款NoSQL数据库产品做了性能对比测试。在所有的测试中,三款产品的表现各有千秋。 报道详情:http://code.csdn.net/news/2823026
NoSQL数据库(如MongoDB,Cassandra):MongoDB数据模型与文档设计.docx
NoSQL数据库(如MongoDB,Cassandra):MongoDB的索引与性能优化.docx
NoSQL数据库(如MongoDB,Cassandra):MongoDB的分片与水平扩展.docx
benchmark_hbase_cassandra 使用 YCSB 对 HBase 和 Cassandra 进行基准测试的脚本。 数据库 - HBase 和 Cassandra benchmark_report.pdf 该文件包含使用 YCSB 的 HBase 和 Cassandra 基准测试结果的报告和观察结果。...
这个合集涵盖了四个关键的技术:Hadoop、Cassandra、HBase和NoSQL,这些都是构建大规模分布式数据存储和处理系统的基础。 **Hadoop** 是一个开源的框架,主要用于处理和存储大量数据。《Hadoop权威指南》是了解...
【nosqlJavaLib】是一个专为Java开发者设计的NoSQL数据库操作库,它提供了一个统一的接口,使得在Cassandra和MongoDB之间进行基础数据操作变得更加便捷。这个库的出现,旨在简化多数据库环境下的开发工作,使得...
代表产品有Cassandra、HBase、Riak等。 文档型数据库以文档形式存储数据,通常存储为JSON、XML等格式,适合于存储半结构化数据,查询效率较键值存储更高。代表产品有MongoDB、CouchDB、MongoDb(4.x)、国内开源的...
* 列存储数据库:例如 Cassandra、HBase * 图关系数据库:例如 Neo4j、Amazon Neptune 分布式数据库 分布式数据库是指将数据分布式存储在多个节点上的数据库系统。分布式数据库可以提供高可用性、负载均衡和水平...
**Cassandra与HBase系统架构比对** Cassandra和HBase是两个广泛使用的分布式NoSQL数据库,它们在处理大规模数据存储和检索方面表现出色。两者都设计用于处理PB级的数据,支持高并发读写操作,并且是高度可扩展的。...