`
leonzhx
  • 浏览: 804139 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

Brief Introduction of MongoDB

 
阅读更多

1.  MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built for high read and write throughput and the ability to scale easily with automatic failover. Whether an application requires just one database node or dozens of them, MongoDB can provide surprisingly good performance.

 

2.  A document-based data model can represent rich, hierarchical data structures, it’s often possible to do without the complicated multi-table joins imposed by relational databases. When you open the MongoDB JavaScript shell, you can easily get a comprehensible representation of your product with all its information hierarchically organized in a JSON-like structure. With MongoDB, the object defined in the programming language can be persisted “as is,” removing some of the complexity of object mappers.

 

3.  MongoDB was originally developed by 10gen for a platform hosting web applications that, by definition, required its database to scale gracefully across multiple machines. Its design as a horizontally scalable primary data store sets it apart from other modern database systems.

 

4.  MongoDB’s data model is document-oriented. A document is essentially a set of property names and their values. The values can be simple data types, such as strings, numbers, and dates. But these values can also be arrays and even other documents. A document-oriented data model naturally represents data in an aggregate form, allowing you to work with an object holistically.

 

5.  Documents need not conform to a prespecified schema. MongoDB groups documents into collections, containers that don’t impose any sort of schema. In theory, each document in a collection can have a completely different structure; in practice, a collection’s documents will be relatively uniform.

 

6.  To say that a system supports ad hoc queries is to say that it’s not necessary to define in advance what sorts of queries the system will accept. Relational databases have this property; key-value store systems usually sacrifice rich query power in exchange for a simple scalability model. But MongoDB still support ad hoc queries.

 

7.  Proper indexes will increase query and sort speeds by orders of magnitude; consequently, any system that supports ad hoc queries should also support secondary indexes. Secondary indexes in MongoDB are implemented as B-trees. B-tree indexes, also the default for most relational databases, are optimized for a variety of queries, including range scans and queries with sort clauses. By permitting multiple secondary indexes, MongoDB allows users to optimize for a wide variety of queries. With MongoDB, you can create up to 64 indexes per collection.

 

8.  MongoDB provides database replication via a topology known as a replica set. Replica sets distribute data across machines for redundancy and automate failover in the event of server and network outages. Replica sets consist of exactly one primary node and one or more secondary nodes. A replica set’s primary node can accept both reads and writes, but the secondary nodes are read-only. If the primary node fails, the cluster will pick a secondary node and automatically promote it to primary. When the former primary comes back online, it’ll do so as a secondary.

 

9.  In MongoDB, users control the speed and durability trade-off by choosing write semantics and deciding whether to enable journaling. All writes, by default, are fire-and-forget, which means that these writes are sent across a TCP socket without requiring a database response. If users want a response, they can issue a write using a special safe mode provided by all drivers. This forces a response, ensuring that the write has been received by the server with no errors. Safe mode is configurable; it can also be used to block until a write has been replicated to some number of servers.

 

10.  In MongoDB v2.0, journaling is enabled by default. With journaling, every write is committed to an append-only log. If the server is ever shut down uncleanly (say, in a power outage), the journal will be used to ensure that MongoDB’s data files are restored to a consistent state when you restart the server.

 

11.  The technique of augmenting a single node’s hardware for scale is known as vertical scaling or scaling up. Vertical scaling has the advantages of being simple, reliable, and cost-effective up to a certain point. Instead of beefing up a single node, scaling horizontally (or scaling out) means distributing the database across multiple machines. The distribution of data across machines mitigates the consequences of failure.

 

12. MongoDB has been designed to make horizontal scaling manageable. It does so via a range-based partitioning mechanism, known as auto-sharding, which automatically manages the distribution of data across nodes. The sharding system handles the addition of shard nodes, and it also facilitates automatic failover. Individual shards are made up of a replica set consisting of at least two nodes, ensuring automatic recovery with no single point of failure. All this means that no application code has to handle these logistics; your application code communicates with a sharded cluster just as it speaks to a single node.

 

13. MongoDB is written in C++. The project compiles on all major operating systems, including Mac OS X, Windows, and most flavors of Linux. MongoDB is open source and licensed under the GNU-AGPL. The source code is freely available on GitHub. The project is guided by the 10gen core server team.

 

14. The core database server runs via an executable called mongod (mongodb.exe on Windows). The mongod server process receives commands over a network socket using a custom binary protocol. All the data files for a mongod process are stored by default in /data/db(c:\data\db on Windows)

 

15. mongod can be run in several modes, the most common of which is as a member of a replica set. Replica set configurations generally consist of two replicas, plus an arbiter process residing on a third server. For MongoDB’s auto-sharding architecture, the components consist of mongod processes configured as per-shard replica sets, with special metadata servers, known as config servers, on the side. A separate routing server called mongos is also used to send requests to the appropriate shard.

 

16. The MongoDB command shell is a JavaScript-based tool for administering the database and manipulating data. The mongo executable loads the shell and connects to a specified mongod process. Most commands are issued using JavaScript expressions. The shell also permits you to run administrative commands. Some examples include viewing the current database operation, checking the status of replication to a secondary node, and configuring a collection for sharding.

17. All documents require a primary key stored in the _id field. You’re allowed to enter a custom _id as long as you can guarantee its uniqueness. But if you omit the _id altogether, then a MongoDB object ID will be inserted automatically.

18. All of the MongoDB drivers implement similar methods for saving a document to a collection, but the representation of the document itself will usually be whatever is most natural to each language. In Java you represent documents using a special document builder class that implements LinkedHashMap.

19. Little abstraction beyond the driver itself is required to build an application. Many developers like using a thin wrapper(Morphia for Java) over the drivers to handle associations, validations, and type checking.

 

20. MongoDB is bundled with several command-line utilities:
  a)  mongodump and mongorestore—Standard utilities for backing up and restoring a database. mongodump saves the database’s data in its native BSON format; this tool has the advantage of being usable for hot backups which can easily be restored with mongorestore.
  b)  mongoexport and mongoimport—These utilities export and import JSON, CSV, and TSV data;
  c)  mongosniff—A wire-sniffing tool for viewing operations sent to the database. Essentially translates the BSON going over the wire to human-readable shell statements.
  d)  mongostat—Similar to iostat; constantly polls MongoDB and the system to provide helpful stats, including the number of operations per second, the amount of virtual memory allocated, and the number of connections to the server.d

  Please reference http://docs.mongodb.org/manual/reference/ for details.

21. MongoDB is well suited as a primary data store for web applications, for analytics and logging applications, and for any application requiring a medium-grade cache. In addition, because it easily stores schemaless data, MongoDB is also good for capturing data whose structure can’t be known in advance.

22. The best-known simple key-value store is memcached (pronounced mem-cash-dee). Memcached stores its data in memory only, so it trades persistence for speed. It’s also distributed; memcached nodes running across multiple servers can act as a single data store, eliminating the complexity of maintaining cache state across machines. Compared with MongoDB, a simple key-value store like memcached will often allow for faster reads and writes. But unlike MongoDB, these systems can rarely act as primary data stores. Simple key-value stores are best used as adjuncts, either as caching layers atop a more traditional database or as simple persistence layers for ephemeral services like job queues.

 

23. Sophisticated key-value stores manage a relatively self-contained domain that demands significant storage and availability. Because of their masterless architecture, these systems scale easily with the addition of nodes. They opt for eventual consistency, which means that reads don’t necessarily reflect the latest write. But what users get in exchange for weaker consistency is the ability to write in the face of any one node’s failure. This contrasts with MongoDB, which provides strong consistency, a single master (per shard), a richer data model, and secondary indexes.

 

24. MongoDB and RDMS are both capable of representing a rich data model, although where RDMS uses fixed-schema tables, MongoDB has schema-free documents. RDBMS and MongoDB both support B-tree indexes. RDMS supports both joins and transactions, so if you must use SQL or if you require transactions, then you’ll need to use RDBMS. That said, MongoDB’s document model is often rich enough to represent objects without requiring joins. And its updates can be applied atomically to individual documents, providing a subset of what’s possible with traditional transactions. Both MongoDB and RDBMS support replication. MongoDB has been designed to scale horizontally, with sharding and failover handled automatically. Any sharding on RDBMS has to be managed manually, and given the complexity involved, it’s more common to see a vertically scaled RDBMS system.

25. MongoDB works well for analytics and logging. MongoDB’s relevance to analytics derives from its speed and from two key features: targeted atomic updates and capped collections. Atomic updates let clients efficiently increment counters and push values onto arrays. Capped collections, often useful for logging, feature fixed allocation, which lets them age out automatically. Storing logging data in a database, as compared with the file system, provides easier organization and much greater query power. Now, instead of using grep or a custom log search utility, users can employ the MongoDB query language they know and love to examine log output.

26. MongoDB should usually be run on 64-bit machines. 32-bit systems are capable of addressing only 4 GB of memory. Acknowledging that typically half of this memory will be allocated by the operating system and program processes, this leaves just 2 GB of memory on which to map the data files. So if you’re running 32-bit, and if you have even a modest number of indexes defined, you’ll be strictly limited to as little as 1.5 GB of data.

 

27. A second consequence of using virtual memory mapping is that memory for the data will be allocated automatically, as needed. This makes it trickier to run the database in a shared environment. As with database servers in general, MongoDB is best run on a dedicated server.

 

28. It’s important to run MongoDB with replication, especially if you’re not running with journaling enabled. Because MongoDB uses memory-mapped files, any unclean shutdown of a mongod not running with journaling may result in corruption. Therefore, it’s necessary in this case to have a replicated backup available for failover.

 

Why MongoDB?

  • Document-oriented
    • Documents (objects) map nicely to programming language data types
    • Embedded documents and arrays reduce need for joins
    • Dynamically-typed (schemaless) for easy schema evolution
    • No joins and no multi-document transactions for high performance and easy scalability
  • High performance
    • No joins and embedding makes reads and writes fast
    • Indexes including indexing of keys from embedded documents and arrays
    • Optional streaming writes (no acknowledgements)
  • High availability
    • Replicated servers with automatic master failover
  • Easy scalability
    • Automatic sharding (auto-partitioning of data across servers)
      • Reads and writes are distributed over shards
      • No joins or multi-document transactions make distributed queries easy and fast
    • Eventually-consistent reads can be distributed over replicated servers
  • Rich query language

Large MongoDB deployment

1.       One or more shards, each shard holds a portion of the total data (managed automatically). Reads and writes are automatically routed to the appropriate shard(s). Each shard is backed by a replica set – which just holds the data for that shard.

2.       A replica set is one or more servers, each holding copies of the same data. At any given time one is primary and the rest are secondaries. If the primary goes down one of the secondaries takes over automatically as primary. All writes and consistent reads go to the primary, and all eventually consistent reads are distributed amongst all the secondaries.

3.       Multiple config servers, each one holds a copy of the meta data indicating which data lives on which shard.

4.       One or more routers, each one acts as a server for one or more clients. Clients issue queries/updates to a router and the router routes them to the appropriate shard while consulting the config servers.

5.       One or more clients, each one is (part of) the user's application and issues commands to a router via the mongo client library (driver) for its language.

mongod is the server program (data or config). mongos is the router program.


Small deployment (no partitioning)

1.       One replica set (automatic failover), or one server with zero or more slaves (no automatic failover).

2.       One or more clients issuing commands to the replica set as a whole or the single master (the driver will manage which server in the replica set to send to).

Mongo data model

  • A Mongo system (see deployment above) holds a set of databases
  • A database holds a set of collections
  • A collection holds a set of documents
  • A document is a set of fields
  • A field is a key-value pair
  • A key is a name (string)
  • A value is a
    • basic type like string, integer, float, timestamp, binary, etc.,
    • a document, or
    • an array of values
  • 大小: 10.5 KB
分享到:
评论

相关推荐

    Seven Databases in Seven Weeks - Luc Perkins

    After a brief introduction, this book tackles a series of seven databases chapter by chapter. The databases were chosen to span five different database genres or styles, which are discussed in Chapter...

    camel-manual-2.0

    Here’s a brief walkthrough of a sample code snippet to give a sense of how Camel can be configured and used: ```java CamelContext context = new DefaultCamelContext(); ``` This line initializes a ...

    Manning.Spring.in.Action.4th.Edition.2014.11.epub

    Brief Table of Contents Table of Contents Praise for the Third Edition of Spring in Action Preface Acknowledgments About this Book 1. Core Spring Chapter 1. Springing into action 1.1. Simplifying Java...

    漫画作品与时间旅行题材.doc

    漫画作品与时间旅行题材

    基于SpringBoot框架的的在线视频教育平台的设计与实现(含完整源码+完整毕设文档+PPT+数据库文件).zip

    Spring Boot特点: 1、创建一个单独的Spring应用程序; 2、嵌入式Tomcat,无需部署WAR文件; 3、简化Maven配置; 4、自动配置Spring; 5、提供生产就绪功能,如指标,健康检查和外部配置; 6、绝对没有代码生成和XML的配置要求;第一章 绪 论 1 1.1背景及意义 1 1.2国内外研究概况 2 1.3 研究的内容 2 第二章 关键技术的研究 3 2.1 相关技术 3 2.2 Java技术 3 2.3 ECLIPSE 开发环境 4 2.4 Tomcat介绍 4 2.5 Spring Boot框架 5 第三章 系统分析 5 3.1 系统设计目标 6 3.2 系统可行性分析 6 3.3 系统功能分析和描述 7 3.4系统UML用例分析 8 3.4.1管理员用例 9 3.4.2用户用例 9 3.5系统流程分析 10 3.5.1添加信息流程 11 3.5.2操作流程 12 3.5.3删除信息流程 13 第四章 系统设计 14 4.1 系统体系结构 15 4.2 数据库设计原则 16 4.3 数据表 17 第五章 系统实现 18 5.1用户功能模块 18 5.2

    PyTorch入门指南:从零开始掌握深度学习框架.pdf

    内容概要:本文作为PyTorch的入门指南,首先介绍了PyTorch相较于TensorFlow的优势——动态计算图、自动微分和丰富API。接着讲解了环境搭建、PyTorch核心组件如张量(Tensor)、autograd模块以及神经网络的定义方式(如nn.Module),并且给出了详细的神经网络训练流程,包括前向传播、计算损失值、进行反向传播以计算梯度,最终调整权重参数。此外还简要提及了一些拓展资源以便进一步探索这个深度学习工具。 适用人群:初次接触深度学习技术的新学者和技术爱好者,有一定程序基础并希望通过PyTorch深入理解机器学习算法实现的人。 使用场景及目标:该文档有助于建立使用者对于深度学习及其具体实践有更加直观的理解,在完成本教程之后,读者应当能够在个人设备上正确部署Python环境,并依据指示独立创建自己的简易深度学习项目。 其他说明:文中所提及的所有示例均可被完整重现,同时官方提供的资料链接也可以方便有兴趣的人士对感兴趣之处继续挖掘,这不仅加深了对PyTorch本身的熟悉程度,也为未来的研究或者工程项目打下了良好的理论基础和实践经验。

    古镇美食自驾游:舌尖上的历史韵味.doc

    古镇美食自驾游:舌尖上的历史韵味

    基于人工神经网络(ANN)的高斯白噪声的系统识别 附Matlab代码.rar

    1.版本:matlab2014/2019a/2024a 2.附赠案例数据可直接运行matlab程序。 3.代码特点:参数化编程、参数可方便更改、代码编程思路清晰、注释明细。 4.适用对象:计算机,电子信息工程、数学等专业的大学生课程设计、期末大作业和毕业设计。

    漫画作品与神话传说融合.doc

    漫画作品与神话传说融合

    实时电价机制下交直流混合微网优化运行方法 附Matlab代码.rar

    1.版本:matlab2014/2019a/2024a 2.附赠案例数据可直接运行matlab程序。 3.代码特点:参数化编程、参数可方便更改、代码编程思路清晰、注释明细。 4.适用对象:计算机,电子信息工程、数学等专业的大学生课程设计、期末大作业和毕业设计。

    ADC推理软件AI程序

    ADC推理软件AI程序

    漫画作品与科幻元素融合.doc

    漫画作品与科幻元素融合

    【电缆】中压电缆局部放电的传输模型研究 附Matlab代码.rar

    1.版本:matlab2014/2019a/2024a 2.附赠案例数据可直接运行matlab程序。 3.代码特点:参数化编程、参数可方便更改、代码编程思路清晰、注释明细。 4.适用对象:计算机,电子信息工程、数学等专业的大学生课程设计、期末大作业和毕业设计。

    基于人工神经网络的类噪声环境声音声学识别 附Matlab代码.rar

    1.版本:matlab2014/2019a/2024a 2.附赠案例数据可直接运行matlab程序。 3.代码特点:参数化编程、参数可方便更改、代码编程思路清晰、注释明细。 4.适用对象:计算机,电子信息工程、数学等专业的大学生课程设计、期末大作业和毕业设计。

    多约束、多车辆VRP问题 附Matlab代码.rar

    1.版本:matlab2014/2019a/2024a 2.附赠案例数据可直接运行matlab程序。 3.代码特点:参数化编程、参数可方便更改、代码编程思路清晰、注释明细。 4.适用对象:计算机,电子信息工程、数学等专业的大学生课程设计、期末大作业和毕业设计。

    基于麻雀搜索算法(SSA)优化长短期记忆神经网络参数SSA-LSTM冷、热、电负荷预测 附Python代码.rar

    1.版本:matlab2014/2019a/2024a 2.附赠案例数据可直接运行matlab程序。 3.代码特点:参数化编程、参数可方便更改、代码编程思路清晰、注释明细。 4.适用对象:计算机,电子信息工程、数学等专业的大学生课程设计、期末大作业和毕业设计。

    java-springboot+vue景区民宿预约系统实现源码(完整前后端+mysql+说明文档+LunW+PPT).zip

    java-springboot+vue景区民宿预约系统实现源码(完整前后端+mysql+说明文档+LunW+PPT).zip

    56页-智慧园区解决方案(伟景行).pdf

    在智慧城市建设的大潮中,智慧园区作为其中的璀璨明珠,正以其独特的魅力引领着产业园区的新一轮变革。想象一下,一个集绿色、高端、智能、创新于一体的未来园区,它不仅融合了科技研发、商业居住、办公文创等多种功能,更通过深度应用信息技术,实现了从传统到智慧的华丽转身。 智慧园区通过“四化”建设——即园区运营精细化、园区体验智能化、园区服务专业化和园区设施信息化,彻底颠覆了传统园区的管理模式。在这里,基础设施的数据收集与分析让管理变得更加主动和高效,从温湿度监控到烟雾报警,从消防水箱液位监测到消防栓防盗水装置,每一处细节都彰显着智能的力量。而远程抄表、空调和变配电的智能化管控,更是在节能降耗的同时,极大地提升了园区的运维效率。更令人兴奋的是,通过智慧监控、人流统计和自动访客系统等高科技手段,园区的安全防范能力得到了质的飞跃,让每一位入驻企业和个人都能享受到“拎包入住”般的便捷与安心。 更令人瞩目的是,智慧园区还构建了集信息服务、企业服务、物业服务于一体的综合服务体系。无论是通过园区门户进行信息查询、投诉反馈,还是享受便捷的电商服务、法律咨询和融资支持,亦或是利用云ERP和云OA系统提升企业的管理水平和运营效率,智慧园区都以其全面、专业、高效的服务,为企业的发展插上了腾飞的翅膀。而这一切的背后,是大数据、云计算、人工智能等前沿技术的深度融合与应用,它们如同智慧的大脑,让园区的管理和服务变得更加聪明、更加贴心。走进智慧园区,就像踏入了一个充满无限可能的未来世界,这里不仅有科技的魅力,更有生活的温度,让人不禁对未来充满了无限的憧憬与期待。

    边境自驾游异国风情深度体验.doc

    边境自驾游异国风情深度体验

    武汉东湖高新集团智慧园区 22页PPT(21页).pptx

    在智慧城市建设的大潮中,智慧园区作为其中的璀璨明珠,正以其独特的魅力引领着产业园区的新一轮变革。想象一下,一个集绿色、高端、智能、创新于一体的未来园区,它不仅融合了科技研发、商业居住、办公文创等多种功能,更通过深度应用信息技术,实现了从传统到智慧的华丽转身。 智慧园区通过“四化”建设——即园区运营精细化、园区体验智能化、园区服务专业化和园区设施信息化,彻底颠覆了传统园区的管理模式。在这里,基础设施的数据收集与分析让管理变得更加主动和高效,从温湿度监控到烟雾报警,从消防水箱液位监测到消防栓防盗水装置,每一处细节都彰显着智能的力量。而远程抄表、空调和变配电的智能化管控,更是在节能降耗的同时,极大地提升了园区的运维效率。更令人兴奋的是,通过智慧监控、人流统计和自动访客系统等高科技手段,园区的安全防范能力得到了质的飞跃,让每一位入驻企业和个人都能享受到“拎包入住”般的便捷与安心。 更令人瞩目的是,智慧园区还构建了集信息服务、企业服务、物业服务于一体的综合服务体系。无论是通过园区门户进行信息查询、投诉反馈,还是享受便捷的电商服务、法律咨询和融资支持,亦或是利用云ERP和云OA系统提升企业的管理水平和运营效率,智慧园区都以其全面、专业、高效的服务,为企业的发展插上了腾飞的翅膀。而这一切的背后,是大数据、云计算、人工智能等前沿技术的深度融合与应用,它们如同智慧的大脑,让园区的管理和服务变得更加聪明、更加贴心。走进智慧园区,就像踏入了一个充满无限可能的未来世界,这里不仅有科技的魅力,更有生活的温度,让人不禁对未来充满了无限的憧憬与期待。

Global site tag (gtag.js) - Google Analytics