`
13146489
  • 浏览: 252824 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

A Year with MongoDB

 
阅读更多
原文地址:http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb
This week marks the one year anniversary of Kiip running MongoDB in production. As of this week, we’ve also moved over 95% of our data off of MongoDB onto systems such as Riak and PostgreSQL, depending which solution made sense for the way we use our data. This post highlights our experience with MongoDB over the past year. A future post will elaborate on the migration process: how we evaluated the proper solutions to migrate to and how we migrated the data from MongoDB.

First, some numbers about our data to give context to the scale being discussed. The figures below represent the peak usage when we were completely on MongoDB — the numbers are actually much higher now but are spread across different data stores.

Data size: 240 GB
Total documents: 85,000,000
Operations per second: 520 (Create, reads, updates, etc.)
The Good

We were initially attracted to MongoDB due to the features highlighted on the website as well as word of mouth from those who had used it successfully. MongoDB delivered on some of its promises, and our early experiences were positive.

Schemaless - Being a document data store, the schemaless-nature of MongoDB helps a lot. It is easy to add new fields, and even completely change the structure of a model. We changed the structure of our heaviest used models a couple times in the past year, and instead of going back and updating millions of old documents, we simply added a “version” field to the document and the application handled the logic of reading both the old and new version. This flexibility was useful for both application developers and operations engineers.

Simple replication - Replica Sets are easy to setup and work well enough. There are some issues that I’ll talk about later, but for the most part as an early stage startup, this feature was easy to incorporate and appeared to work as advertised.

Query Language - Querying into documents and being able to perform atomic operations on your data is pretty cool. Both of these features were used heavily. Unfortunately, these queries didn’t scale due to underlying architectural problems. Early on we were able to use advanced queries to build features quickly into our application.

Full-featured Drivers for Many Languages - 10gen curates official MongoDB drivers for many languages, and in our experience the driver for each language we’ve tried has been top-notch. Drivers were never an issue when working with MongoDB.

The Bad

Although MongoDB has a lot of nice features on the surface, most of them are marred by underlying architectural issues. These issues are certainly fixable, but currently limit the practical usage we were able to achieve with MongoDB. This list highlights some of the major issues we ran into.

Non-counting B-Trees - MongoDB uses non-counting B-trees as the underlying data structure to index data. This impacts a lot of what you’re able to do with MongoDB. It means that a simple count of a collection on an indexed field requires Mongo to traverse the entire matching subset of the B-tree. To support limit/offset queries, MongoDB needs to traverse the leaves of the B-tree to that point. This unnecessary traversal causes data you don’t need to be faulted into memory, potentially purging out warm or hot data, hurting your overall throughput. There has been an open ticket for this issue since September, 2010.

Poor Memory Management - MongoDB manages memory by memory mapping your entire data set, leaving page cache management and faulting up to the kernel. A more intelligent scheme would be able to do things like fault in your indexes before use as well as handle faulting in of cold/hot data more effectively. The result is that memory usage can’t be effectively reasoned about, and performance is non-optimal.

Uncompressed field names - If you store 1,000 documents with the key “foo”, then “foo” is stored 1,000 times in your data set. Although MongoDB supports any arbitrary document, in practice most of your field names are similar. It is considered good practice to shorten field names for space optimization. A ticket for this issue has been open since April 2010, yet this problem still exists today. At Kiip, we built field aliasing into our model layer, so a field with name “username” may actually map to “u” in the database. The database should handle this transparently by keeping a logical mapping between field names and a compressed form, instead of requiring clients to handle it explicitly.

Global write lock - MongoDB (as of the current version at the time of writing: 2.0), has a process-wide write lock. Conceptually this makes no sense. A write on collection X blocks a write on collection Y, despite MongoDB having no concept of transactions or join semantics. We reached practical limitations of MongoDB when pushing a mere 200 updates per second to a single server. At this point, all other operations including reads are blocked because of the write lock. When reaching out to 10gen for assistance, they recommended we look into sharding, since that is their general scaling solution. With other RDBMS solutions, we would at least be able to continue vertically scaling for some time before investigating sharding as a solution.

Safe off by default - This is a crazy default, although useful for benchmarks. As a general analogy: it’s like a car manufacturer shipping a car with air bags off, then shrugging and saying “you could’ve turned it on” when something goes wrong. We lost a sizable amount of data at Kiip for some time before realizing what was happening and using safe saves where they made sense (user accounts, billing, etc.).

Offline table compaction - The on-disk data size with MongoDB grows unbounded until you compact the database. Compaction is extremely time consuming and blocks all other DB operations, so it must be done offline or on a secondary/slave server. Traditional RDBMS systems such as PostgreSQL have handled this with auto-vacuums that clean up the database over time.

Secondaries do not keep hot data in RAM - The primary doesn’t relay queries to secondary servers, preventing secondaries from maintaining hot data in memory. This severely hinders the “hot-standby” feature of replica sets, since the moment the primary fails and switches to a secondary, all the hot data must be once again faulted into memory. Faulting in gigabytes of data can be painfully slow, especially when your data is backed by something like EBS. Distributing reads to secondaries helps with this, but if you’re only using secondaries as a means of backup or failover, the effect on throughput when a primary switch happens can be crippling until your hot data is faulted in.

What We’re Doing Now

Initially, we felt MongoDB gave us the flexibility and power we needed in a database. Unfortunately, underlying architectural issues forced us to investigate other solutions rather quickly. We never attempted to horizontally scale MongoDB since our confidence in the product was hurt by the time that was offered as a solution, and because we believe horizontally scaling shouldn’t be necessary for the relatively small amount of ops per second we were sending to MongoDB.

Over the past 6 months, we’ve “scaled” MongoDB by moving data off of it. This process is an entire blog post itself, but the gist of the matter is that we looked at our data access patterns and chose the right tool for the job. For key-value data, we switched to Riak, which provides predictable read/write latencies and is completely horizontally scalable. For smaller sets of relational data where we wanted a rich query layer, we moved to PostgreSQL. A small fraction of our data has been moved to non-durable purely in-memory solutions if it wasn’t important for us to persist or be able to query later.

In retrospect, MongoDB was not the right solution for Kiip. Although it may be a bit more upfront effort, we recommend using PostgreSQL (or some traditional RDBMS) first, then investigating other solutions if and when you find them necessary. In future blog posts, we’ll talk about how we chose our data stores and the steps we took to migrate data while minimizing downtime.
分享到:
评论

相关推荐

    Web Development with MongoDB and Node(3rd) epub

    Web Development with MongoDB and Node(3rd) 英文epub 第3版 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除

    Web Development with MongoDB and Node(3rd) azw3

    Web Development with MongoDB and Node(3rd) 英文azw3 第3版 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除

    MongoDB with F#

    ### 使用 F# 探索 MongoDB #### 背景与目的 随着 NoSQL 数据库的兴起,MongoDB 成为了许多开发者的首选方案。对于 F# 开发者来说,结合 MongoDB 的灵活性与 F# 的强大功能可以实现高效快速的原型开发。本文旨在探讨...

    Building Enterprise-Grade Blockchain Databases with MongoDB

    Building Enterprise-Grade Blockchain Databases with MongoDB

    Building Node Applications with MongoDB and Backbone.pdf

    #### 标题解析:Building Node Applications with MongoDB and Backbone 此书名明确了本书的核心内容:如何使用Node.js、MongoDB及Backbone这三个关键技术来构建Web应用程序。这三个技术组合在一起提供了一个全栈的...

    Web Development with MongoDB and NodeJS(PACKT,2ed,2015)

    Node.js and MongoDB are quickly becoming one of the most popular ...By the end of the book, you will have a running web application developed with MongoDB and Node.js along with it’s popular frameworks.

    MongoDB.Data.Modeling.1782175342

    The book begins with a brief discussion of data models, drawing a parallel between relational databases, NoSQL, and consequently MongoDB. Next, the book explains the most basic MongoDB concepts, such ...

    Web development with MongoDB and Node

    Web development with MongoDB and Node, 主要采用的技术包括Express等

    mongodb入门

    MongoDB是一种流行的开源NoSQL数据库系统,它以其高性能、高可用性和易扩展性而受到开发者的青睐。MongoDB使用了一种叫做BSON(类似于JSON)的格式来存储数据,它的数据模型与传统的关系型数据库有所不同,更贴近于...

    Web Development with MongoDB and NodeJS, 2nd Edition

    本书《Web Development with MongoDB and NodeJS, 2nd Edition》是关于使用Node.js和MongoDB技术栈进行Web开发的教程。第二版的发布,反映了随着技术的迅速发展,对于Web开发的实践和理论也在不断地更新和进化。本书...

    Web Development with MongoDB and NodeJS(2nd) 无水印pdf

    Web Development with MongoDB and NodeJS(2nd) 英文无水印pdf 第2版 pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,...

    JavaScript Applications with Node.js, React, React Native and MongoDB

    Enjoy an all-encompassing presentation of theory, reference and implementation for building three-tier architectures with a Data Layer (MongoDB), Service Layer (Node.js/Express) and Presentation Layer...

    NoSQL with MongoDB in 24 Hours

    Learn how to leverage MongoDB's immense power and NoSQL concepts and MongoDB techniques from the ground up.

    linux安装mongodb教程

    编辑 /etc/sysconfig/iptables 文件,在 -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited 上一行加入以下命令: -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 27017 ...

    Mastering MongoDB 3.x

    Mastering MongoDB 3.x: An expert’s guide to ... By the end this book, you will be equipped with all the required industry skills and knowledge to become a certified MongoDB developer and administrator.

Global site tag (gtag.js) - Google Analytics