`
like.eagle
  • 浏览: 253125 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

MongoDB Schema Design(MongoDB模式设计)

阅读更多

MongoDB官网转载:http://www.mongodb.org/display/DOCS/Schema+Design

Schema Design

  • Introduction
  • Embed vs. Reference
  • Use Cases
  • Index Selection
  • How Many Collections?
  • See Also

Introduction

With Mongo, you do less "normalization" than you would perform designing a relational schema because there are no server-side "joins". Generally, you will want one database collection for each of your top level objects.

You do not want a collection for every "class" - instead, embed objects. For example, in the diagram below, we have two collections, students and courses. The student documents embed address documents and the "score" documents, which have references to the courses.

 

 

 

Compare this with a relational schema, where you would almost certainly put the scores in a separate table, and have a foreign-key relationship back to the students.

Embed vs. Reference

The key question in Mongo schema design is "does this object merit its own collection, or rather should it embed in objects in other collections?" In relational databases, each sub-item of interest typically becomes a separate table (unless denormalizing for performance). In Mongo, this is not recommended - embedding objects is much more efficient. Data is then colocated on disk; client-server turnarounds to the database are eliminated. So in general the question to ask is, "why would I not want to embed this object?"

So why are references slow? Let's consider our students example. If we have a student object and perform:

 

print( student.address.city );

 

This operation will always be fast as address is an embedded object, and is always in RAM if student is in RAM. However for

 

print( student.scores[0].for_course.name );

 

if this is the first access to scores[0], the shell or your driver must execute the query

// pseudocode for driver or framework, not user code

student.scores[0].for_course = db.courses.findOne({_id:_course_id_to_find_});

 

Thus, each reference traversal is a query to the database. Typically, the collection in question is indexed on _id. The query will then be reasonably fast. However, even if all data is in RAM, there is a certain latency given the client/server communication from appserver to database. In general, expect 1ms of time for such a query on a ram cache hit. Thus if we were iterating 1,000 students, looking up one reference per student would be quite slow - over 1 second to perform even if cached. However, if we only need to look up a single item, the time is on the order of 1ms, and completely acceptable for a web page load. (Note that if already in db cache, pulling the 1,000 students might actually take much less than 1 second, as the results return from the database in large batches.)

Some general rules on when to embed, and when to reference:

  • "First class" objects, that are at top level, typically have their own collection.
  • Line item detail objects typically are embedded.
  • Objects which follow an object modelling "contains" relationship should generally be embedded.
  • Many to many relationships are generally by reference.
  • Collections with only a few objects may safely exist as separate collections, as the whole collection is quickly cached in application server memory.
  • Embedded objects are harder to reference than "top level" objects in collections, as you cannot have a DBRef to an embedded object (at least not yet).
  • It is more difficult to get a system-level view for embedded objects. For example, it would be easier to query the top 100 scores across all students if Scores were not embedded.
  • If the amount of data to embed is huge (many megabytes), you may reach the limit on size of a single object.
  • If performance is an issue, embed.

Use Cases

Let's consider a few use cases now.

  1. Customer / Order / Order Line-Item
  • orders should be a collection. customers a collection. line-items should be an array of line-items embedded in the order object.
  1. Blogging system.
  • posts should be a collection. post author might be a separate collection, or simply a field within posts if only an email address. comments should be embedded objects within a post for performance.

Index Selection

A second aspect of schema design is index selection. As a general rule, where you want an index in a relational database, you want an index in Mongo.

  • The _id field is automatically indexed.
  • Fields upon which keys are looked up should be indexed.
  • Sort fields generally should be indexed.

The MongoDB profiling facility provides useful information for where an index should be added that is missing.

Note that adding an index slows writes to a collection, but not reads. Use lots of indexes for collections with a high read : write ratio (assuming one does not mind the storage overage). For collections with more writes than reads, indexes are very expensive.

How Many Collections?

As Mongo collections are polymorphic, one could have a collection objects and put everything in it! This approach is taken by some object databases. For performance reasons, we do not recommend this approach. Data within a Mongo collection tends to be contiguous on disk. Thus, table scans of the collection are possible, and efficient. Collections are very important for high throughput batch processing.

See Also

分享到:
评论

相关推荐

    The.Little.Mongo.DB.Schema.Design.Book151739402

    The Little MongoDB Schema Design Book, covers the fundamentals off Schema design with MongoDB, as well as several useful Schema design patters for your applications. I wrote this book to be a helpful...

    mongodb_架构设计基础schemadesign-cn

    从给定的文件信息中,我们可以提炼出关于MongoDB架构设计基础的重要知识点,涉及数据建模的历史、目的、以及在MongoDB中实现这些设计的具体方法。 ### 数据建模简史 数据建模的历史源远流长,从最早的ISAM...

    mongodb-meetup-schema-design

    主要内容涵盖了MongoDB的基本介绍、设计原则以及如何在MongoDB中进行有效的模式设计。以下是本次演讲的关键知识点总结。 #### 二、使用丰富的文档 MongoDB作为一种文档数据库,其核心优势之一就是能够支持复杂的、...

    mongodb-schema-simulator:了解MongoDB Hardway代码和其他相关内容

    MongoDB模式模拟器工具 构建MongoDB Schema Simulator Tool的目的是允许模拟《 The Little MongoDB Schema Design Book概述的The Little MongoDB Schema Design Book 。 链接 安装工具 安装该工具非常简单 npm ...

    mongodb-schema-design:JSON模式定义,用于灵活的广告分类引擎

    "mongodb-schema-design"项目专注于JSON模式定义,为构建一个灵活的广告分类引擎提供了数据模型的指导。 JSON(JavaScript Object Notation)模式定义在MongoDB中起到了类似传统关系数据库中的表结构的作用,但它...

    MongoDB.Data.Modeling.1782175342

    Focus on data usage and better design schemas with the help of MongoDB About This Book Create reliable, scalable data models with MongoDB Optimize the schema design process to support applications of...

    MongoDB in Action, 2nd Edition

    Application developers love MongoDB, a document-... A reference section on schema design patterns helps ease the transition from the relational data model of SQL to MongoDB’s document-based data model.

    50 Tips and Tricks for MongoDB Developers

    Application Design Tips: What to keep in mind when designing your schema Implementation Tips: Programming applications against MongoDB Optimization Tips: Speeding up your applications Data Safety Tips...

    项目-博客。

    11. **响应式设计(Responsive Design)**:考虑到现代网页需要在不同设备上良好展示,项目可能采用了Bootstrap、Foundation或其他响应式布局库,确保博客在手机、平板和桌面电脑上的显示效果。 12. **部署...

    SafeAuth:使用React&Express创建的安全loginregister应用程序

    Ant Design是基于React的高质量UI设计框架,提供了丰富的组件和样式,帮助快速构建专业且美观的用户界面。在SafeAuth中,可以利用Ant Design的表单组件、按钮、布局等,轻松实现登录和注册页面的设计。 ### 八、...

    Chili-TV_api:辣椒tv服务器端:基于hapi

    这个项目是无聊的时候做的,主要客户端为移动端,利用唤醒URL schema原理, 可以将自己喜欢的资源通过百度云分享上传到平台 利用平台的共享性 或许自己的所需资源,相当于一个百度云盘的集合,整体项目分三部分 采用...

    Flask By Example.pdf 最新 原版

    - Enhancing user interaction through responsive design. 5. **Advanced Topics and Best Practices**: - Deploying Flask applications to production environments. - Debugging and testing strategies. -...

Global site tag (gtag.js) - Google Analytics