`
C_J
  • 浏览: 127881 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

Getting Start on Mongodb

阅读更多

题记:

    最近老和同学聊到non-relational-db的领域,今天恰巧看到robbin大哥对这个领域的见解,让我心情澎拜。

 

 

    WEB2.0的兴起暴露了关系型数据库的弊端,推动了非关系型数据库的发展。

    对于WEB应用,强调了高读写操作,海量数据存储,横向扩展,正如robbin大哥说的,关系型数据库的优点在WEB应用面前变得无用武之地:事务一致性、多表查询。

    解决高读写操作则牺牲一致性,内存操作,并异步flush到文件系统;

    解决海量数据则写自己的文件系统;

    解决横向扩展需要解决集群的可拔插。

 

各类non-relational数据库都有各自的特点,MongoDB能支撑海量数据,TC/TT提供很好的高并发读写性能,Cassandra适合集群(它更像一组网络服务尔非数据库)。

 

个人觉得海量数据和对用户提供的高并发必然集群,所以即便MongoDB很好支持了海量存储,但不知集群方面做的怎样,所以Cassandra是值得学习的。

   

关于NOSQL的产品之一——MongoDB

 

MongoDB—Kyle Banker—10gen

MongoDB is JSON document oriented database. These documents are stored in the database as BSON (binary JSON). BSON is efficient, fast, and is richer in type than JSON (i.e. regex support). Documents are grouped in collections which are analogous to relational tables, but are schema free.

GridFS is a specication for storing large binary files like images and videos in MongoDB. Every document has a 4MB limit. GridFS chuncs the large files into such 4MB parts inside a collection, with a saperate metadata collection. MusicNation.com stores all music and video alongside the application data in MongoDB (about 1TB).

MongoDB has its own wire protocol with socket drivers for several languages. The drivers serializes the data to BSON before transfer.

Replication is used for failover and redundancy. Most commonly a master-slave setup is used. It’s also possible to setup a replica pair architecture.

MongoDB provides a custom query language which should be as powerful as SQL. MongoDB understands the internal structures of its documents which enables dynamic queries. Map/reduce functions are also supported in the query language.

BusinessInsider.com has been using MongoDB for two years with 12M page views/month. They like the simplification of the data model. Posts for instance have embedded comments. They also store real-time analytics in MongoDB which enables fast inserts and eased data analysis with dymanic queries. Uses a single MongoDB database server, 3 Apache web servers, and Memcached caching only on the front page.

TweetCongress.org are users of MongoDB and likes that code defines the schema, and one can therefore version control the schema. They use a single master with snapshots on a 64-bit EC2 instance.

SourceForge.net had a large redesign this summer where they moved to MongoDB. Their goal was to store the front pages, project pages, and download pages in a single document. It’s deployed with one master and 5-6 read-only slaves (obviously scaled for reads and reliability).

 

 

Download

The easiest (and recommended) way to install MongoDB is to use the pre-built binaries.

32-bit binaries

Download and extract the 32-bit .zip. The "Production" build is recommended.

64-bit binaries

Download and extract the 64-bit .zip.

Note: 64-bit is recommended, although you must have a 64-bit version of Windows to run that version.

Unzip

Unzip the downloaded binary package to the location of your choice. You may want to rename mongo-xxxxxxx to just "mongo" for convenience.

Create a data directory

By default MongoDB will store data in C:\data\db, but it won't automatically create that folder, so we do so here:

C:\> mkdir \data
C:\> mkdir \data\db

Or you can do this from the Windows Explorer, of course.

Run and connect to the server

The important binaries for a first run are:

  • mongod.exe - the database server
  • mongo.exe - the administrative shell

To run the database, click mongod.exe in Explorer, or run it from a CMD window.

C:\> cd \my_mongo_dir\bin
C:\my_mongo_dir\bin > mongod
Getting A Database Connection

Let's now try manipulating the database with the database shell . (We could perform similar operations from any programming language using an appropriate driver.  The shell is convenient for interactive and administrative use.)

Start the MongoDB JavaScript shell with:

 

Connect to a database server running locally on the default port:

mongodb://localhost

Connect and login to the admin database as user "fred" with password "foobar":

mongodb://fred:foobar@localhost

Connect and login to the "baz" database as user "fred" with password "foobar":

mongodb://fred:foobar@localhost/baz

Connect to a replica pair, with one server on example1.com and another server on example2.com:

mongodb://example1.com:27017,example2.com:27017

Connect to a replica set with three servers running on localhost (on ports 27017, 27018, and 27019):

mongodb://localhost,localhost:27018,localhost:27019

 

 

 

"connecting to:" tells you the name of the database the shell is using. To switch databases, type:

> use mydb
switched to db mydb

To see a list of handy commands, type help.

Tip for Developers with Experience in Other Databases
You may notice, in the examples below, that we never create a database or collection. MongoDB does not require that you do so. As soon as you insert something, MongoDB creates the underlying collection and database. If you query a collection that does not exist, MongoDB treats it as an empty collection.

Switching to a database with the use command won't immediately create the database - the database is created lazily the first time data is inserted. This means that if you use a database for the first time it won't show up in the list provided by `show dbs` until data is inserted.

Each MongoDB server can support multiple databases. Each database is independent, and the data for each database is stored separately, for security and ease of management.

 

 

Using a Large Number of Collections 

 

Generally, having a large number of collections has no significant performance penalty, and results in very good performance.

Limits

By default MongoDB has a limit of approximately 24,000 namespaces per database.  Each collection counts as a namespace, as does each index.  Thus if every collection had one index, we can create up to 12,000 collections.  Use the --nssize parameter to set a higher limit.

Be aware that there is a certain minimum overhead per collection -- a few KB.  Further, any index will require at least 8KB of data space as the b-tree page size is 8KB.

--nssize

If more collections are required, run mongod with the --nssize parameter specified.  This will make the <database>.ns file larger and support more collections.  Note that --nssize sets the size used for newly created .ns files -- if you have an existing database and wish to resize, after running the db with --nssize, run the db.repairDatabase() command from the shell to adjust the size.

Maximum .ns file size is 2GB.

 

 

MongoDB (BSON) Data Types

Mongo uses special data types in addition to the basic JSON types of string, integer, boolean, double, null, array, and object. These types include date, object id, binary data, regular expression, and code. Each driver implements these types in language-specific ways, see your driver's documentation for details.

 

 

GridFS

<!-- Root decorator: this is a layer of abstraction that Confluence doesn't need. It will be removed eventually. -->

<!-- wiki content -->

GridFS is a specification for storing large files in MongoDB. All of the officially supported driver implement the GridFS spec.

 

 

JSON

For example the following "document" can be stored in Mongo DB:

{ author: 'joe',
  created : new Date('03/28/2009'),
  title : 'Yet another blog post',
  text : 'Here is the text...',
  tags : [ 'example', 'joe' ],
  comments : [ { author: 'jim', comment: 'I disagree' },
              { author: 'nancy', comment: 'Good post' }
  ]
}

This document is a blog post, so we can store in a "posts" collection using the shell:

> doc = { author : 'joe', created : new Date('03/28/2009'), ... }
> db.posts.insert(doc);

MongoDB understands the internals of BSON objects -- not only can it store them, it can query on internal fields and index keys based upon them.  For example the query

> db.posts.find( { "comments.author" : "jim" } )

is possible and means "find any blog post where at least one comment subjobject has author == 'jim'".

 

  

> j = { name : "mongo" };
{"name" : "mongo"}
> t = { x : 3 };
{ "x" : 3  }
> db.things.save(j);
> db.things.save(t);
> db.things.find();
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
>

 

 

 

Let's add some more records to this collection:

> for (var i = 1; i <= 20; i++) db.things.save({x : 4, j : i});
> db.things.find();
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
has more

Querying

<!-- Root decorator: this is a layer of abstraction that Confluence doesn't need. It will be removed eventually. -->
<!-- wiki content -->

One of MongoDB's best capabilities is its support for dynamic (ad hoc) queries. Systems that support dynamic queries don't require any special indexing to find data; users can find data using any criteria. For relational databases, dynamic queries are the norm. If you're moving to MongoDB from a relational databases, you'll find that many SQL queries translate easily to MongoDB's document-based query language.

return every document in the users collection. Our query would look like this:
  db.users.find({})

In this case, our selector is an empty document, which matches every document in the collection. Here's a more selective example:

  db.users.find({'last_name': 'Smith'})

Here our selector will match every document where the last_name attribute is 'Smith.'

Field Selection

In addition to the query expression, MongoDB queries can take some additional arguments. For example, it's possible to request only certain fields be returned. If we just wanted the social security numbers of users with the last name of 'Smith,' then from the shell we could issue this query:

  // retrieve ssn field for documents where last_name == 'Smith':
  db.users.find({last_name: 'Smith'}, {'ssn': 1});

  // retrieve all fields *except* the thumbnail field, for all documents:
  db.users.find({}, {thumbnail:0});

Note the _id field is always returned even when not explicitly requested.

Sorting

MongoDB queries can return sorted results. To return all documents and sort by last name in ascending order, we'd query like so:

  db.users.find({}).sort({last_name: 1});

Skip and Limit

MongoDB also supports skip and limit for easy paging. Here we skip the first 20 last names, and limit our result set to 10:

db.users.find().skip(20).limit(10);
db.users.find({}, {}, 10, 20); // same as above, but less clear

slaveOk

When querying a replica pair or replica set, drivers route their requests to the master mongod by default; to perform a query against an (arbitrarily-selected) slave, the query can be run with the slaveOk option. Here's how to do so in the shell:

db.getMongo().setSlaveOk(); // enable querying a slave
db.users.find(...)

Note: some language drivers permit specifying the slaveOk option on each find(), others make this a connection-wide setting. See your language's driver for details.

Cursors

Database queries, performed with the find() method, technically work by returning a cursor. Cursors are then used to iteratively retrieve all the documents returned by the query. For example, we can iterate over a cursor in the mongo shell like this:

> var cur = db.example.find();
> cur.forEach( function(x) { print(tojson(x))});
{"n" : 1 , "_id" : "497ce96f395f2f052a494fd4"}
{"n" : 2 , "_id" : "497ce971395f2f052a494fd5"}
{"n" : 3 , "_id" : "497ce973395f2f052a494fd6"}
>

Removing Objects from a Collection

To remove objects from a collection, use the remove() function in the mongo shell. (Other drivers offer a similar
function, but may call the function "delete". Please check your driver's documentation ).

remove() is like find() in that it takes a JSON-style query document as an argument to select which documents are removed. If you call remove() without a document argument, or with an empty document {}, it will remove all documents in the collection. Some examples :

db.things.remove({});    // removes all
db.things.remove({n:1}); // removes all where n == 1

If you have a document in memory and wish to delete it, the most efficient method is to specify the item's document _id value as a criteria:

db.things.remove({_id: myobject._id});

You may be tempted to simply pass the document you wish to delete as the selector, and this will work, but it's inefficient.

SELECT * FROM things WHERE name="mongo"
> db.things.find({name:"mongo"}).forEach(printjson);
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
SELECT * FROM things WHERE x=4
> db.things.find({x:4}).forEach(printjson);
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "x" : 4, "j" : 19 }
{ "_id" : ObjectId("4c220a42f3924d31102bd869"), "x" : 4, "j" : 20 }

The query expression is an document itself. A query document of the form { a:A, b:B, ... } means "where a==A and b==B and ...". More information on query capabilities may be found in the Queries and Cursors section of the Mongo Developers' Guide.

 

 

To illustrate, lets repeat the last example find({x:4}) with an additional argument that limits the returned document to just the "j" elements:

SELECT j FROM things WHERE x=4
> db.things.find({x:4}, {j:true}).forEach(printjson);
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "j" : 18 }
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "j" : 19 }
{ "_id" : ObjectId("4c220a42f3924d31102bd869"), "j" : 20 }

Note that the "_id" field is always returned.

 

However, the findOne() method is both convenient and efficient:

> printjson(db.things.findOne({name:"mongo"}));
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }

This is more efficient because the client requests a single object from the database, so less work is done by the database and the network. This is the equivalent of find({name:"mongo"}).limit(1).

 

 

This is highly recommended for performance reasons, as it limits the work the database does, and limits the amount of data returned over the network. For example:

> db.things.find().limit(3);
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
分享到:
评论

相关推荐

    ruby on rails对mongodb的操作

    本文将深入探讨如何在Ruby on Rails中集成并操作MongoDB。 首先,我们需要安装必要的库。MongoDB的Ruby驱动程序是`mongo` gem,而`mongoid`或`mongo_mapper`是两个流行的ORM(对象关系映射)工具,它们允许我们用...

    MongoDB on Kubernetes技术解决方案.pptx

    MongoDB on Kubernetes技术解决方案 MongoDB on Kubernetes技术解决方案旨在提供一种快速、灵活和高效的方式来部署和管理MongoDB数据库在Kubernetes集群中。该解决方案利用Kubernetes的强大API和Operator来实现...

    MongoDB实验 - .docx

    systemctl start mongod.service ``` 停止 MongoDB,可以使用以下命令: ``` systemctl stop mongod.service ``` 查看 MongoDB 的状态,可以使用以下命令: ``` systemctl status mongod.service ``` 四、设置开机...

    Mastering MongoDB 3.x

    The book is based on MongoDB 3.x and covers topics ranging from database querying using the shell, built in drivers, and popular ODM mappers to more advanced topics such as sharding, high ...

    linux安装mongodb教程

    /usr/local/mongodb/mongodb-linux-2.0.7/bin/mongod --dbpath=/usr/local/mongodb/data/db --logpath=/usr/local/mongodb/mongodb-linux-2.0.7/logs/mongodb.log --logappend --port=27017 --fork 知识点 6:配置...

    mongodb Windows7 64位

    打开命令提示符,以管理员身份运行,执行`mongod --config "C:\Program Files\MongoDB\Server\&lt;version&gt;\bin\mongod.cfg" --service install`来安装服务,然后用`net start MongoDB`启动服务。 3. **连接MongoDB** ...

    mongodb4.4.6安装包

    5. **启动服务**:使用`sudo systemctl start mongod`启动MongoDB服务,用`sudo systemctl enable mongod`使其开机自启。 在安装完成后,你需要创建数据库、用户,并了解基本的MongoDB操作,如数据插入、查询、更新...

    MongoDB常用命令批处理

    - `NET START MongoDB.bat`:这是一个Windows批处理命令,用于启动名为MongoDB的服务。`NET START`命令是Windows内置的命令,用于启动系统服务。 3. **停止MongoDB服务**: - `NET stop MongoDB.bat`:与启动服务...

    mongodb7.0.0安装包

    使用“mongod.exe” --install命令,然后用“net start MongoDB”启动服务。 8. 安全性:默认情况下,MongoDB不开启任何安全措施。为了生产环境的安全,你应该设置访问控制,如添加用户、启用身份验证以及配置网络...

    mongodb-测试数据

    MongoDB是一种流行的开源、分布式文档数据库,常被用于构建高性能、可扩展的应用程序。这个“mongodb-测试数据”压缩包显然包含了一些用于测试MongoDB功能的样例数据集,特别是针对增、删、改、查(CRUD)操作的学习...

    MongoDB4.2.21 Linux版本安装包

    sudo systemctl start mongod sudo systemctl enable mongod ``` 7. **验证安装**:通过MongoDB shell连接并检查版本: ``` mongo --version ``` 8. **安全注意事项**:安装完成后,强烈建议执行`mongo` ...

    mongodb linux 64位安装包

    然后运行`sudo systemctl start mongod`启动服务,`sudo systemctl enable mongod`使其开机启动。 8. **管理MongoDB**:通过`systemctl`命令,你可以对MongoDB进行管理,如`status`查看状态,`stop`停止服务,`...

    mongodb数据库安装包 windows

    在命令行中,输入`net start MongoDB`来启动MongoDB服务。 6. 验证安装 打开另一个命令行窗口,输入`mongo`命令。如果一切正常,你将进入MongoDB的shell界面,可以开始运行MongoDB的查询和其他操作。 7. 安全性...

    MongoDB图形化管理工具 MongoDB Compass

    MongoDB图形化管理工具 MongoDB Compass

    Linux安装mongodb客户端

    sudo vim /etc/yum.repos.d/mongodb-org-4.2.repo 写入: [mongodb-org-4.2] name=MongoDB Repository baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/4.2/x86_64/ gpgcheck=1 enabled=1 gpg...

    MongoDB c#驱动 dll

    通过 StartSession 和 CommitTransaction 方法实现事务操作。 通过以上介绍,我们可以看到MongoDB的C#驱动为开发者提供了丰富的功能和便捷的接口,使其能够充分利用MongoDB的特性进行数据管理。无论是简单的CRUD...

    MongoDB安装包及一键启动脚本

    然后,只需运行`./start_mongodb.sh`即可启动MongoDB服务。 7. **监控和管理MongoDB**: MongoDB提供了`mongo`命令行工具,用于交互式地与数据库通信。此外,可以通过`mongostat`和`mongotop`监控MongoDB的状态和...

    liunux centos系统mongodb3.2.0的压缩包

    sudo systemctl start mongod ``` 4. 设置 MongoDB 服务开机启动: ``` sudo systemctl enable mongod ``` 5. 配置 MongoDB: 默认情况下,MongoDB 在安装后会监听所有网络接口。你可以通过编辑 `/etc/...

    CentOS(Linux)离线安装MongoDB7.0详细教程(亲测可行)

    systemctl start mongodb.service ``` 2. **停止服务**: ```bash sudo service mongodb stop ``` 3. **重启服务**: ```bash sudo service mongodb restart ``` #### 四、添加至开机启动 1. **添加 ...

    mongodb离线安装

    $ /usr/local/mongodb-3.4.24/script/start_mongodb.sh ``` #### 五、卸载MongoDB - 如果需要卸载MongoDB,可以使用以下命令: ```bash $ sudo apt-get --purge remove mongodb mongodb-clients mongodb-server...

Global site tag (gtag.js) - Google Analytics