日期:2014-05-16  浏览次数:20409 次

mongodb中使用MapReduce

MapReduce函数的用法如下:

?db.users.mapReduce(map, reduce? [, {option}] ? )

后边的 option参数可选,但 out参数必须要有,否则会报没有指定输出的错误,out的值有以下几种:

?{?replace ?: "collectionName" } - the output will be inserted into a collection which will atomically replace any existing collection with the same name.

  • {?merge ?: "collectionName" } - This option will merge new data into the old output collection. In other words, if the same key exists in both the result set and the old collection, the new key will overwrite the old one.
  • {?reduce ?: "collectionName" } - If documents exists for a given key in the result set and in the old collection, then a reduce operation (using the specified reduce function) will be performed on the two values and the result will be written to the output collection. If a finalize function was provided, this will be run after the reduce as well.
  • {?inline ?: 1} - With this option, no collection will be created, and the whole map-reduce operation will happen in RAM. Also, the results of the map-reduce will be returned within the result object. Note that this option is possible only when the result set fits within the 16MB limit of a single document. In?v2.0 , this is your only available option on a replica set secondary.

另外,在使用 ./mongo登录到客户端上,map和reduce函数都不能被引号引起来,否则就是字符串,而不是函数了,这点就是纯粹的javascript

举个例子:

对于类似如下形式的collection(名为:example)
{_id:4,type:'cat',num:1}
{_id:11,type:'dog',num:3}
{_id:34,type:'pig',num:1}
{_id:40,type:'cat',num:2}


> map=function(){emit(this._id,1)}
function () {
emit(this._id, 1);
}

> reduce=function(key,values){return {count:1}}?
function (key, values) {
return {count:2}; //在这修改要输出的值(1)
}

> res=db.example.mapReduce(map,reduce,{out:"temp"}); //此处的{out:"temp"}必须要加上,表示将结果暂时保存到“temp”集合中;也可以使用{out:{inline:1}},即将结果输出到内存中
{
"result" : "temp",
"timeMillis" : 492,
"counts" : {
"input" : 12453,
"emit" : 12453,
"output" : 12076
},
"ok" : 1,
}

> db.temp.find()
{ "_id" : 4, "value" : 1 }
{ "_id" : 11, "value" : 1 }
{ "_id" : 34, "value" : 1 }
{ "_id" : 40, "value" : 1 }


> res
{
"result" : "temp",
"timeMillis" : 492,
"counts" : {
"input" : 12453,
"emit" : 12453,