`

MongoDB中group() mapReduce() aggregate()之比较

 
阅读更多

对于SQL而言,如果从users表里查询每个team所有成员的number,查询语句如下:

 

  1. SELECT team, no FROM users GROUP BY team                             (1)  

 

但是对于Mongodb而言,实现这样的功能,则比较复杂。

从mongodb2.2之后,有了三个function可以实现这个功能,他们按照产生的顺序,分别是group(), mapReduce()和aggregate().

他们之间的区别有哪些呢?参照stack overflow上讨论http://stackoverflow.com/questions/12337319/mongodb-aggregation-comparison-group-group-and-mapreduce整理如下:

 

1.     db.collection.group().

定义为:

 

[javascript] view plaincopy
  1. Db.collection.group(  
  2.             key,  
  3.             reduce,  
  4.             initial,  
  5.             keyf,  
  6.             cond,  
  7.             finalize).  

 

特征为:

  • Simple syntax and functionality for grouping .. analogous to GROUP BY in SQL.
  • Returns result set inline (as an array of grouped items).
  • Implemented using the JavaScript engine; custom reduce() functions can be written in JavaScript.
  • Current Limitations
    • Will not group into a result set with more than 10,000 keys.
    • Results must fit within the limitations of a BSON document (currently 16Mb).
    • Takes a read lock and does not allow any other threads to execute JavaScript while it is running.
    • Does not work with sharded collections.

Ex: 如果需要实现语句1的功能,实现如下:

[javascript] view plaincopy
  1. db.users.group({key: {team: 1}, initial: {members: []}, reduce: function(cur, result){result.members.push(cur.no);}});  

 

2.     db.collection.mapReduce().

据说增加mapreduce是为了迎合mapreduce的流行。

 

[javascript] view plaincopy
  1. db.collection.mapReduce(  
  2.                          <mapfunction>,  
  3.                         <reducefunction>,  
  4.                          {  
  5.                            out: <collection>,  
  6.                            query: <document>,  
  7.                            sort: <document>,  
  8.                            limit: <number>,  
  9.                            finalize: <function>,  
  10.                            scope: <document>,  
  11.                            jsMode: <boolean>,  
  12.                            verbose: <boolean>  
  13.                          }  
  14.                        )  

 

特征为:

  • Implements the MapReduce model for processing large data sets.
  • Can choose from one of several output options (inline, new collection, merge, replace, reduce)
  • MapReduce functions are written in JavaScript.
  • Supports non-sharded and sharded input collections.
  • Can be used for incremental aggregation over large collections.
  • MongoDB 2.2 implements much better support for sharded map reduce output.
  • Current Limitations
    • There is a JavaScript lock so a mongod server can only execute one JavaScript function at a point in time .. however, most steps of the MapReduce are very short so locks can be yielded frequently.
    • MapReduce functions can be difficult to debug. You can use print() and printjson() to include diagnostic output in the mongod log.
    • MapReduce is generally not intuitive for programmers trying to translate relational query aggregation experience.

由于需要用到js engine,所以速度是比较慢的,具体的可以参照http://technicaldebt.com/?p=1157

 

Ex: 如果需要实现语句1的功能,实现如下:

[javascript] view plaincopy
  1. var map = function(){ emit(this.team, this.no); };   
  2. var reduce = function(key, value){ return {team: key, members: value}; };  
  3. db.users.mapReduce(map, reduce, {out: "team_member"});  

 

3.     db.collection.aggregate().

For simplertasks, mapReduce is big hammer. And avoid overhead of JavaScript engine, alsoselect matching subdocuments and arrays. Aggregate framework is implementedwithpipelinein C++.

Pipeline 定义的操作有:

$match – query predicate as a filter.

$project – use a sample document todetermine the shape of the result.

$unwind – hands out array elements oneat a time.

$group – aggregates items into bucketsdefined by a key.

$sort – sort document.

$limit – allow the specified number ofdocuments to pass

$skip – skip over the specified numberof documents.

特征如下:

  • New feature in the MongoDB 2.2.0 production release (August, 2012).
  • Designed with specific goals of improving performance and usability.
  • Returns result set inline.
  • Supports non-sharded and sharded input collections.
  • Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as matching, projecting, sorting, and grouping.
  • Pipeline operators need not produce one output document for every input document: operators may also generate new documents or filter out documents.
  • Using projections you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.
  • Pipeline operators can be repeated as needed (for example, multiple $project or $groupsteps.
  • Current Limitations
    • Results are returned inline, so are limited to the maximum document size supported by the server (16Mb)
    • Doesn't support as many output options as MapReduce
    • Limited to operators and expressions supported by the Aggregation Framework (i.e. can't write custom functions)
    • Newest server feature for aggregation, so has more room to mature in terms of documentation, feature set, and usage.

Ex: 如果需要实现语句1的功能,实现如下:

[javascript] view plaincopy
  1. db.users.aggregate({$project: {team: 1, no: 1}}, {$group: { _id: "$team", memebers: {$addToSet: "$no"}}});  

 

Refs:

http://docs.mongodb.org/manual/aggregation/#Aggregation-Examples

http://docs.mongodb.org/manual/reference/method/db.collection.group/

http://technicaldebt.com/?p=1157

http://stackoverflow.com/questions/12337319/mongodb-aggregation-comparison-group-group-and-mapreduce

 
分享到:
评论

相关推荐

    浅析mongodb中group分组

    在MongoDB中,当你需要根据某个字段的值来汇总数据时,`group`命令非常有用。以下是对MongoDB中`group`分组功能的详细解析: 1. **分组依据**: - `key`:指定用于分组的字段名,如`{age:true}`表示按照`age`字段...

    MongoDB 聚合管道(Aggregation Pipeline)

    MongoDB的聚合操作可以使用命令行、驱动程序或MongoDB Shell中的aggregate()函数来执行。在聚合函数中,可以链式调用多个阶段操作符,形成一个完整的聚合管道。 值得一提的是,张善友在其博客中对MongoDB聚合管道...

    MongoDB简单聚合函数.pdf

    然而,`group`操作在某些版本的MongoDB中可能已被`aggregate`管道取代,因为`aggregate`提供了更强大的聚合功能。 4. **MapReduce**: MapReduce是一种分布式计算模型,常用于处理和生成大数据集。在MongoDB中,...

    mongodb教程

    MongoDB 的聚合框架允许你对数据进行分析和处理,类似于SQL的GROUP BY操作。它支持管道操作,可以进行数据转换、过滤、分组、排序等多种操作。 8. MapReduce 虽然MongoDB的聚合框架在很多场景下更高效,但它仍然...

    第二课:MongoDB企业级应用操作1

    在本课“第二课:MongoDB企业级应用操作1”中,我们将探讨两个关键的聚合操作:pipeline 聚合和 mapReduce 聚合,这些都是在处理大数据和进行复杂分析时的重要工具。 首先,我们来看**pipeline 聚合**。MongoDB 的...

    第二课:mongodb企业级应用管理1

    在MongoDB中,聚合有两种主要形式:Pipeline聚合和MapReduce聚合。 1. Pipeline聚合: - `$match`:用于过滤输入文档,只保留满足特定条件的文档。 - `$project`:用于选择要包含在输出文档中的字段,以及转换...

    MongoDB入门教程之聚合和游标操作介绍

    在MongoDB中,`group`操作使用`$reduce`函数对分组后的数据进行处理。例如,可以按照年龄(age)分组并收集同龄人的名字(name)。`group`操作还可以设置`condition`和`finalize`参数来进一步过滤和处理数据。 4. **...

    MongoDB使用手册

    - `db.collection.aggregate([{ $match: { condition } }, { $group: { _id: "$field", total: { $sum: "$amount" } } }])`: 执行聚合操作,过滤并计算汇总信息。 **3.2 使用客户端操作MongoDB** - **3.2.1 ...

    第一课:mongoDb快速入手1

    此外,MongoDB 还支持复杂查询和聚合操作,例如 `aggregate` 函数可以实现类似 SQL 的 GROUP BY 功能,通过管道(pipeline)处理数据。`mapReduce` 则用于进行数据映射和归约,适用于大规模数据分析。 在实际应用中...

    MongoDB聚合分组取第一条记录的案例与实现方法

    由于MongoDB聚合框架无法直接找出每个分组中的最新记录,我们需要进一步处理。这里采取了两次`forEach`循环,首先遍历`mt_resources_access_log20190122`集合,然后对每个分组进行内部处理,找出对应`refererDomain`...

    MongoDB中的参数限制与阀值详析

    在分片集群(Sharded Clusters)中,group聚合函数不适用,应使用mapReduce或aggregate方法。覆盖查询(Covered Queries)在分片集群下要求查询条件包含shard key,否则无法利用索引来提高性能。对于已分片的collections...

    digitous.Mongodb

    在JavaScript中,你可以使用`.find()`方法进行查询,使用`.aggregate()`进行复杂的数据分析。 5. 更新操作: 更新文档时,你可以使用`.updateOne()`或`.updateMany()`方法。这些方法接受一个匹配条件和一个更新操作...

Global site tag (gtag.js) - Google Analytics