关于《来，我给你们看一段神奇的mongodb的mapreduce操作！》的解释 -

gong1208

浏览: 560087 次
性别:
来自: 北京

最近访客更多访客>>

wangenbao1

FirstBlood

无为我是谁

1055848233

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

关于《来，我给你们看一段神奇的mongodb的mapreduce操作！》的解释

博客分类：

mongodb

mongodb mapreduce 奇怪的问题

各位好，在阅读本文请务必先阅读上一篇文章《来，我给你们看一段神奇的mongodb的mapreduce操作！》，链接：http://gong1208.iteye.com/blog/1830576

因为此文是上一篇文章的解释。

我在上篇博客中指出的mongodb进行mapreduce时出现的奇怪的错误，其实是我个人的错误，原因在于mongodb进行mapreduce时，reduce函数有一段说明：

Requirements for the reduce Function

The reduce function has the following prototype:

function(key, values) {

...

return result;

}

The reduce function exhibits the following behaviors:

The reduce function should not access the database, even to perform read operations.
The reduce function should not affect the outside system.
MongoDB will not call the reduce function for a key that has only a single value.
The reduce function can access the variables defined in the scope parameter.

Because it is possible to invoke the reduce function more than once for the same key, the following properties need to be true

he type of the return object must be identical to the type of the value emitted by the map function to ensure that the following operations is true:

· reduce(key, [ C, reduce(key, [ A, B ]) ] ) == reduce( key, [ C, A, B ] )

the reduce function must be idempotent. Ensure that the following statement is true:

· reduce( key, [ reduce(key, valuesArray) ] ) == reduce( key, valuesArray )

the order of the elements in the valuesArray should not affect the output of the reduce function, so that the following statement is true:

reduce( key, [ A, B ] ) == reduce( key, [ B, A ] )

参考地址: http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#db.collection.mapReduce

这段话的意思是，reduce函数有可能在执行一个任务是可能会被调用多次，而不是我们理解的传统的方法中，一次任务只调用一次，所以，reduce函数必须是幂等的。简单来说，就是reduce函数中接收的value参数的形式，必须和reduce函数返回的结果value的形式一致。

仍然拿我上个例子说明：

起初我是这么写的：

2.	printjson("job start");  
3.	var map = function() {  
4.	  emit(this.ip, {value: 1});  
5.	}  
6.	  
7.	var reduce = function(key, values) {  
8.	  var count = 0;  
9.	  values.forEach(function(v) {  
10.	    count += v['value'];  
11.	  });  
12.	  return {count: count };  
13.	  
14.	}  
15.	  
16.	var res = db.runCommand({mapreduce:"RegistRecord",map:map, reduce:reduce, out:"log_results"});  
17.	printjson("job end")

可以看出emit函数的第二个参数形式为：{value:number},所以reduce函数的values值的形式为：{value:number}，所以，reduce函数的返回值形式也必须应当是{value:number}，因为reduce函数会将自己的返回值再次作为下一次reduce的输入值使用。

改为如下就正确了：

var reduce = function(key, values) { 
var count = {value:0};  
values.forEach(function(v) {  
 count.value += v['value'];  
});  
return count;  
}

ps：在此特别感谢mongodb社区的Kay.Kim<kay.kim@10gen.com>，我曾发了封邮件向mongodb社区请教此问题，没想到居然收到了社区的热心答复，并为我解答了此问题。

分享到：

一次代码review引发的关于单例模式的思考

2013-04-03 19:14
浏览 4035
评论(2)
分类:数据库
查看更多

2 楼 guodage003 2014-12-06

for(var i = 0; i<2000; i++){
db.test.insert({_id:i, city:"bj"});
}

以上是基础数据，city均为bj，然后我希望利用mapreduce将city为bj的条目id整合出来。

map=function(){emit(this.city, this._id)};
reduce = function(key,values){
var ret= {id:[]};
values.forEach(function(value){
ret.id.push(value);
});
return ret;
}

然后输出的结构很混乱，里面各种嵌套。形如：
{ "_id" : "bj", "value" : { "id" : [ { "id" : [ { "id" : [ { "id" : [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,......

我按照文章最后的意思，修改了value的形式
map=function(){emit(this.city, {id:this._id})};
reduce = function(key,values){
var ret= {id:[]};
values.forEach(function(value){
ret.id.push(value.id);
});
return ret;
}
输出结果有发生改变，但仍然有各种嵌套：形如：
{ "_id" : "bj", "value" : { "id" : [ [ [ [ [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16.....

今天搜到您的帖子，特此请教，我哪里的理解有问题？

1 楼 IceWee 2014-08-21

感谢分享！

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

关于《来，我给你们看一段神奇的mongodb的mapreduce操作！》的解释

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

关于《来，我给你们看一段神奇的mongodb的mapreduce操作！》的解释

评论

发表评论

相关推荐

来，我给你们看一段神奇的mongodb的mapreduce操作

MongoDB 2.2版本发布

如何搭建mongodb分片

Mongodb的安装、主从配置、replicaSet配置

Mongodb的安装、主从配置、replicaSet配置

最近访客更多访客>>