`

mongodb中使用MapReduce

 
阅读更多

MapReduce函数的用法如下:

 db.users.mapReduce(map, reduce  [, {option}]   )

后边的 option参数可选,但 out参数必须要有,否则会报没有指定输出的错误,out的值有以下几种:

 { replace  : "collectionName" } - the output will be inserted into a collection which will atomically replace any existing collection with the same name.

  • merge  : "collectionName" } - This option will merge new data into the old output collection. In other words, if the same key exists in both the result set and the old collection, the new key will overwrite the old one.
  • reduce  : "collectionName" } - If documents exists for a given key in the result set and in the old collection, then a reduce operation (using the specified reduce function) will be performed on the two values and the result will be written to the output collection. If a finalize function was provided, this will be run after the reduce as well.
  • inline  : 1} - With this option, no collection will be created, and the whole map-reduce operation will happen in RAM. Also, the results of the map-reduce will be returned within the result object. Note that this option is possible only when the result set fits within the 16MB limit of a single document. In v2.0 , this is your only available option on a replica set secondary.

另外,在使用 ./mongo登录到客户端上,map和reduce函数都不能被引号引起来,否则就是字符串,而不是函数了,这点就是纯粹的javascript

举个例子:

对于类似如下形式的collection(名为:example)
{_id:4,type:'cat',num:1}
{_id:11,type:'dog',num:3}
{_id:34,type:'pig',num:1}
{_id:40,type:'cat',num:2}


> map=function(){emit(this._id,1)}
function () {
emit(this._id, 1);
}

> reduce=function(key,values){return {count:1}} 
function (key, values) {
return {count:2}; //在这修改要输出的值(1)
}

> res=db.example.mapReduce(map,reduce,{out:"temp"}); //此处的{out:"temp"}必须要加上,表示将结果暂时保存到“temp”集合中;也可以使用{out:{inline:1}},即将结果输出到内存中
{
"result" : "temp",
"timeMillis" : 492,
"counts" : {
"input" : 12453,
"emit" : 12453,
"output" : 12076
},
"ok" : 1,
}

> db.temp.find()
{ "_id" : 4, "value" : 1 }
{ "_id" : 11, "value" : 1 }
{ "_id" : 34, "value" : 1 }
{ "_id" : 40, "value" : 1 }


> res
{
"result" : "temp",
"timeMillis" : 492,
"counts" : {
"input" : 12453,
"emit" : 12453,
"output" : 12076
},
"ok" : 1,
}
> //也可以使用如下形式查询
> res.find()
{ "_id" : 4, "value" : 1 }
{ "_id" : 11, "value" : 1 }
{ "_id" : 34, "value" : 1 }
{ "_id" : 40, "value" : 1 }

注意:上边输出的结果,并没有变成reduce函数所返回的 {count:2} 值,而是返回了map中emit方法弹出的“1”。原因可能是因为对于某个key,如果他的值在符合条件的结果集合中如果只有一条数据,那么mapReduce函数将不会去调用reduce函数进行计算。另外,mapReduce函数返回的结果,其collection结构一般情况下会跟map函数中emit(key,value)方法中的value参数的结构保持一致的

而如果map和reduce函数换成如下形式,reduce函数将被调用:
>map=function(){
emit(this.type,{totalNum:this.num});
}
>reduce=function(key,values){
var totalNum=0;
values.forEach(function(val){
totalNum+=val
}); 
return totalNum;
}
>res=db.example.mapReduce(map,reduce,{out:"temp"});
>res.find();
{_id:"cat",value:{totalNum:3}}
{_id:"dog",value:{totalNum:3}}
{_id:"pig",value:{totalNum:1}}

如果想使用sort和limit等功能,可以使用 res.find().sort({"value.totalNum":-1}).limit(2)来达到目的

再补充一点使用java客户端连接mongodb,使用MapReduceOutput.getCollection().find()返回的结果中会以如下DBObject的形式出现:
{_id:"cat",value:{totalNum:3}}
但如果你使用 DBObject object=MapReduceOutput.getCollection().find() ;//其中的MapReduceOutput应该为Collection.mapReduce()函数返回的对象,此处简便的写成这样,不要误会
System.out.println(object.get("value.totalNum")); //此处将打印出null
因为object.get("value")将返回一个Object,在这个对象没有强制转换成DBObject之前,它是不能简单的使用“.”操作符来获取值的,正确的做法如下:
DBObject value=(DBObject)object.get("value");
System.out.println(value.get("totalNum"));

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics