`
leonzhx
  • 浏览: 793792 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

Brief Introduction of MongoDB

 
阅读更多

1.  MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built for high read and write throughput and the ability to scale easily with automatic failover. Whether an application requires just one database node or dozens of them, MongoDB can provide surprisingly good performance.

 

2.  A document-based data model can represent rich, hierarchical data structures, it’s often possible to do without the complicated multi-table joins imposed by relational databases. When you open the MongoDB JavaScript shell, you can easily get a comprehensible representation of your product with all its information hierarchically organized in a JSON-like structure. With MongoDB, the object defined in the programming language can be persisted “as is,” removing some of the complexity of object mappers.

 

3.  MongoDB was originally developed by 10gen for a platform hosting web applications that, by definition, required its database to scale gracefully across multiple machines. Its design as a horizontally scalable primary data store sets it apart from other modern database systems.

 

4.  MongoDB’s data model is document-oriented. A document is essentially a set of property names and their values. The values can be simple data types, such as strings, numbers, and dates. But these values can also be arrays and even other documents. A document-oriented data model naturally represents data in an aggregate form, allowing you to work with an object holistically.

 

5.  Documents need not conform to a prespecified schema. MongoDB groups documents into collections, containers that don’t impose any sort of schema. In theory, each document in a collection can have a completely different structure; in practice, a collection’s documents will be relatively uniform.

 

6.  To say that a system supports ad hoc queries is to say that it’s not necessary to define in advance what sorts of queries the system will accept. Relational databases have this property; key-value store systems usually sacrifice rich query power in exchange for a simple scalability model. But MongoDB still support ad hoc queries.

 

7.  Proper indexes will increase query and sort speeds by orders of magnitude; consequently, any system that supports ad hoc queries should also support secondary indexes. Secondary indexes in MongoDB are implemented as B-trees. B-tree indexes, also the default for most relational databases, are optimized for a variety of queries, including range scans and queries with sort clauses. By permitting multiple secondary indexes, MongoDB allows users to optimize for a wide variety of queries. With MongoDB, you can create up to 64 indexes per collection.

 

8.  MongoDB provides database replication via a topology known as a replica set. Replica sets distribute data across machines for redundancy and automate failover in the event of server and network outages. Replica sets consist of exactly one primary node and one or more secondary nodes. A replica set’s primary node can accept both reads and writes, but the secondary nodes are read-only. If the primary node fails, the cluster will pick a secondary node and automatically promote it to primary. When the former primary comes back online, it’ll do so as a secondary.

 

9.  In MongoDB, users control the speed and durability trade-off by choosing write semantics and deciding whether to enable journaling. All writes, by default, are fire-and-forget, which means that these writes are sent across a TCP socket without requiring a database response. If users want a response, they can issue a write using a special safe mode provided by all drivers. This forces a response, ensuring that the write has been received by the server with no errors. Safe mode is configurable; it can also be used to block until a write has been replicated to some number of servers.

 

10.  In MongoDB v2.0, journaling is enabled by default. With journaling, every write is committed to an append-only log. If the server is ever shut down uncleanly (say, in a power outage), the journal will be used to ensure that MongoDB’s data files are restored to a consistent state when you restart the server.

 

11.  The technique of augmenting a single node’s hardware for scale is known as vertical scaling or scaling up. Vertical scaling has the advantages of being simple, reliable, and cost-effective up to a certain point. Instead of beefing up a single node, scaling horizontally (or scaling out) means distributing the database across multiple machines. The distribution of data across machines mitigates the consequences of failure.

 

12. MongoDB has been designed to make horizontal scaling manageable. It does so via a range-based partitioning mechanism, known as auto-sharding, which automatically manages the distribution of data across nodes. The sharding system handles the addition of shard nodes, and it also facilitates automatic failover. Individual shards are made up of a replica set consisting of at least two nodes, ensuring automatic recovery with no single point of failure. All this means that no application code has to handle these logistics; your application code communicates with a sharded cluster just as it speaks to a single node.

 

13. MongoDB is written in C++. The project compiles on all major operating systems, including Mac OS X, Windows, and most flavors of Linux. MongoDB is open source and licensed under the GNU-AGPL. The source code is freely available on GitHub. The project is guided by the 10gen core server team.

 

14. The core database server runs via an executable called mongod (mongodb.exe on Windows). The mongod server process receives commands over a network socket using a custom binary protocol. All the data files for a mongod process are stored by default in /data/db(c:\data\db on Windows)

 

15. mongod can be run in several modes, the most common of which is as a member of a replica set. Replica set configurations generally consist of two replicas, plus an arbiter process residing on a third server. For MongoDB’s auto-sharding architecture, the components consist of mongod processes configured as per-shard replica sets, with special metadata servers, known as config servers, on the side. A separate routing server called mongos is also used to send requests to the appropriate shard.

 

16. The MongoDB command shell is a JavaScript-based tool for administering the database and manipulating data. The mongo executable loads the shell and connects to a specified mongod process. Most commands are issued using JavaScript expressions. The shell also permits you to run administrative commands. Some examples include viewing the current database operation, checking the status of replication to a secondary node, and configuring a collection for sharding.

17. All documents require a primary key stored in the _id field. You’re allowed to enter a custom _id as long as you can guarantee its uniqueness. But if you omit the _id altogether, then a MongoDB object ID will be inserted automatically.

18. All of the MongoDB drivers implement similar methods for saving a document to a collection, but the representation of the document itself will usually be whatever is most natural to each language. In Java you represent documents using a special document builder class that implements LinkedHashMap.

19. Little abstraction beyond the driver itself is required to build an application. Many developers like using a thin wrapper(Morphia for Java) over the drivers to handle associations, validations, and type checking.

 

20. MongoDB is bundled with several command-line utilities:
  a)  mongodump and mongorestore—Standard utilities for backing up and restoring a database. mongodump saves the database’s data in its native BSON format; this tool has the advantage of being usable for hot backups which can easily be restored with mongorestore.
  b)  mongoexport and mongoimport—These utilities export and import JSON, CSV, and TSV data;
  c)  mongosniff—A wire-sniffing tool for viewing operations sent to the database. Essentially translates the BSON going over the wire to human-readable shell statements.
  d)  mongostat—Similar to iostat; constantly polls MongoDB and the system to provide helpful stats, including the number of operations per second, the amount of virtual memory allocated, and the number of connections to the server.d

  Please reference http://docs.mongodb.org/manual/reference/ for details.

21. MongoDB is well suited as a primary data store for web applications, for analytics and logging applications, and for any application requiring a medium-grade cache. In addition, because it easily stores schemaless data, MongoDB is also good for capturing data whose structure can’t be known in advance.

22. The best-known simple key-value store is memcached (pronounced mem-cash-dee). Memcached stores its data in memory only, so it trades persistence for speed. It’s also distributed; memcached nodes running across multiple servers can act as a single data store, eliminating the complexity of maintaining cache state across machines. Compared with MongoDB, a simple key-value store like memcached will often allow for faster reads and writes. But unlike MongoDB, these systems can rarely act as primary data stores. Simple key-value stores are best used as adjuncts, either as caching layers atop a more traditional database or as simple persistence layers for ephemeral services like job queues.

 

23. Sophisticated key-value stores manage a relatively self-contained domain that demands significant storage and availability. Because of their masterless architecture, these systems scale easily with the addition of nodes. They opt for eventual consistency, which means that reads don’t necessarily reflect the latest write. But what users get in exchange for weaker consistency is the ability to write in the face of any one node’s failure. This contrasts with MongoDB, which provides strong consistency, a single master (per shard), a richer data model, and secondary indexes.

 

24. MongoDB and RDMS are both capable of representing a rich data model, although where RDMS uses fixed-schema tables, MongoDB has schema-free documents. RDBMS and MongoDB both support B-tree indexes. RDMS supports both joins and transactions, so if you must use SQL or if you require transactions, then you’ll need to use RDBMS. That said, MongoDB’s document model is often rich enough to represent objects without requiring joins. And its updates can be applied atomically to individual documents, providing a subset of what’s possible with traditional transactions. Both MongoDB and RDBMS support replication. MongoDB has been designed to scale horizontally, with sharding and failover handled automatically. Any sharding on RDBMS has to be managed manually, and given the complexity involved, it’s more common to see a vertically scaled RDBMS system.

25. MongoDB works well for analytics and logging. MongoDB’s relevance to analytics derives from its speed and from two key features: targeted atomic updates and capped collections. Atomic updates let clients efficiently increment counters and push values onto arrays. Capped collections, often useful for logging, feature fixed allocation, which lets them age out automatically. Storing logging data in a database, as compared with the file system, provides easier organization and much greater query power. Now, instead of using grep or a custom log search utility, users can employ the MongoDB query language they know and love to examine log output.

26. MongoDB should usually be run on 64-bit machines. 32-bit systems are capable of addressing only 4 GB of memory. Acknowledging that typically half of this memory will be allocated by the operating system and program processes, this leaves just 2 GB of memory on which to map the data files. So if you’re running 32-bit, and if you have even a modest number of indexes defined, you’ll be strictly limited to as little as 1.5 GB of data.

 

27. A second consequence of using virtual memory mapping is that memory for the data will be allocated automatically, as needed. This makes it trickier to run the database in a shared environment. As with database servers in general, MongoDB is best run on a dedicated server.

 

28. It’s important to run MongoDB with replication, especially if you’re not running with journaling enabled. Because MongoDB uses memory-mapped files, any unclean shutdown of a mongod not running with journaling may result in corruption. Therefore, it’s necessary in this case to have a replicated backup available for failover.

 

Why MongoDB?

  • Document-oriented
    • Documents (objects) map nicely to programming language data types
    • Embedded documents and arrays reduce need for joins
    • Dynamically-typed (schemaless) for easy schema evolution
    • No joins and no multi-document transactions for high performance and easy scalability
  • High performance
    • No joins and embedding makes reads and writes fast
    • Indexes including indexing of keys from embedded documents and arrays
    • Optional streaming writes (no acknowledgements)
  • High availability
    • Replicated servers with automatic master failover
  • Easy scalability
    • Automatic sharding (auto-partitioning of data across servers)
      • Reads and writes are distributed over shards
      • No joins or multi-document transactions make distributed queries easy and fast
    • Eventually-consistent reads can be distributed over replicated servers
  • Rich query language

Large MongoDB deployment

1.       One or more shards, each shard holds a portion of the total data (managed automatically). Reads and writes are automatically routed to the appropriate shard(s). Each shard is backed by a replica set – which just holds the data for that shard.

2.       A replica set is one or more servers, each holding copies of the same data. At any given time one is primary and the rest are secondaries. If the primary goes down one of the secondaries takes over automatically as primary. All writes and consistent reads go to the primary, and all eventually consistent reads are distributed amongst all the secondaries.

3.       Multiple config servers, each one holds a copy of the meta data indicating which data lives on which shard.

4.       One or more routers, each one acts as a server for one or more clients. Clients issue queries/updates to a router and the router routes them to the appropriate shard while consulting the config servers.

5.       One or more clients, each one is (part of) the user's application and issues commands to a router via the mongo client library (driver) for its language.

mongod is the server program (data or config). mongos is the router program.


Small deployment (no partitioning)

1.       One replica set (automatic failover), or one server with zero or more slaves (no automatic failover).

2.       One or more clients issuing commands to the replica set as a whole or the single master (the driver will manage which server in the replica set to send to).

Mongo data model

  • A Mongo system (see deployment above) holds a set of databases
  • A database holds a set of collections
  • A collection holds a set of documents
  • A document is a set of fields
  • A field is a key-value pair
  • A key is a name (string)
  • A value is a
    • basic type like string, integer, float, timestamp, binary, etc.,
    • a document, or
    • an array of values
  • 大小: 10.5 KB
分享到:
评论

相关推荐

    Brief introduction of OpenKODE

    标题:Brief introduction of OpenKODE 描述:It's a PPT of brief OpenKODE 标签:OpenKODE OpenGL OpenMax OpenVG 知识点详述: ### OpenKODE:推动手持图形革命的关键技术 #### 一、OpenKODE简介 OpenKODE...

    神经网络简介A Brief Introduction to Neural Networks

    神经网络作为一种重要的机器学习模型,在计算机科学领域内占据了举足轻重的地位。本文将基于给定文件中的信息,深入探讨神经网络的基本概念、发展历程、主要类型及其应用领域,旨在为读者提供一个全面而深入的理解。...

    A Brief Introduction of ATM.pdf

    ATM通过提供高带宽、低延迟和灵活的服务质量(Quality of Service,QoS)控制机制来实现这一目标。 #### ATM工作原理 ATM网络中的数据传输是通过一系列虚拟通道(Virtual Path,VP)和虚拟通道内的虚拟通道连接...

    A Brief Introduction of Theano and Keras

    ### Theano 与 Keras 的简介 #### 一、Theano 概览 Theano 是一个用 Python 编写的库,它允许用户定义、优化并评估涉及多维数组的数学表达式,尤其适用于构建深度学习模型。Theano 的主要特点包括: ...

    brief introduction of DBMS design

    数据库管理系统(DBMS)设计是构建高效、可靠和可扩展数据存储的关键步骤。在这个主题中,我们将深入探讨“简单的数据库设计方法”,重点关注关系数据模型、规范化以及如何进行有效的数据库设计。...

    A Brief History of Computing(2nd) epub

    A Brief History of Computing(2nd) 英文epub 第2版 本资源转载自网络,如有侵权,请联系上传者或csdn删除 查看此书详细信息请在美国亚马逊官网搜索此书

    A Brief History of Artificial Intelligence

    A Brief History of Artificial Intelligence What It Is, Where We Are, and Where We Are Going by Michael Wooldridge (z-lib.org).pdf

    a brief introduction to opennp

    OpenMP是一个支持多平台共享内存并行编程的API,它能够在C/C++和Fortran语言中使用。它提供了一组编译制导(编译指导语句)、库函数和环境变量,用以简化多线程的开发。OpenMP以简洁的方式实现了线程的创建、分配...

    A brief Introduction to Neural Networks

    文章提到的标题“A brief Introduction to Neural Networks”(神经网络简介),内容提到了神经网络的原理、结构、学习过程等基础知识,以及以Java语言编写的神经网络框架。文中强调了神经网络的学习目的在于为读者...

    A Brief Introduction to PySpark

    关于 PySpark 的简介,适合新手入门学习。PySpark is a great language for performing exploratory data ... The goal of this post is to show how to get up and running with PySpark and to perform common tasks.

    The brief introduction of diffusion model and stable diffusion

    【扩散模型】 扩散模型是近年来在人工智能领域,特别是生成艺术方面取得显著进步的重要技术。这一模型基于非平衡热力学的深度无监督学习方法,它通过添加高斯噪声逐渐破坏训练数据,然后学习如何通过反转噪声过程来...

    A Brief Introduction to Boosting

    ### 提升算法(Boosting)简介 #### 一、引言 提升算法(Boosting)是一种通用的方法,用于提高任何给定学习算法的准确性。它最初源于理论框架下的研究,即所谓的“PAC”(Probably Approximately Correct)学习模型...

    a brief introduction to index theorems

    指标定理的介绍,来自知乎蓝青的论文,图片归集成pdf,便于阅读

    A Brief Introduction to Machine Learning for Engineers

    A Brief Introduction to Machine Learning for Engineers A Brief Introduction to Machine Learning for Engineers

    人工智能的教育意义简介Brief Introduction to Educational Implications of Artificial Intelligence

    这本简短的书介绍了人类如何利用人工智能来解决问题和完成任务。

    Think OS: A Brief Introduction to Operating Systems

    《Think OS:操作系统简明入门》是Allen B. Downey所著的一本入门级操作系统教材,主要面向那些对操作系统设计和实现感兴趣的读者。该书不仅涵盖了操作系统的基础知识,还通过实际案例和示例程序,帮助学生和自学者...

    Think OS:操作系统简介Think OS: A Brief Introduction to Operating Systems

    Think OS是面向程序员的操作系统介绍。 计算机架构知识不是先决条件。

    A Brief History of Humankind

    理解区块链就需要了解人文历史,而这本书以其独特的视角向我们讲述了人类的共识来源。

Global site tag (gtag.js) - Google Analytics