MongoDB中group() mapReduce() aggregate()之比较

abc123456789cba

浏览: 590041 次
性别:
来自: 北京

最近访客更多访客>>

hedehuang

lims813927980

kingtsing

fireflyc

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

NoSQL

对于SQL而言，如果从users表里查询每个team所有成员的number，查询语句如下：

[sql]view plaincopy
SELECT team, no FROM users GROUP BY team                             (1)  

但是对于Mongodb而言，实现这样的功能，则比较复杂。

从mongodb2.2之后，有了三个function可以实现这个功能，他们按照产生的顺序，分别是group(), mapReduce()和aggregate().

他们之间的区别有哪些呢？参照stack overflow上讨论http://stackoverflow.com/questions/12337319/mongodb-aggregation-comparison-group-group-and-mapreduce整理如下：

1. db.collection.group().

定义为:

[javascript]view plaincopy
Db.collection.group(  
            key,  
            reduce,  
            initial,  
            keyf,  
            cond,  
            finalize).  

特征为：

Simple syntax and functionality for grouping .. analogous to GROUP BY in SQL.
Returns result set inline (as an array of grouped items).
Implemented using the JavaScript engine; custom reduce() functions can be written in JavaScript.
Current Limitations
- Will not group into a result set with more than 10,000 keys.
- Results must fit within the limitations of a BSON document (currently 16Mb).
- Takes a read lock and does not allow any other threads to execute JavaScript while it is running.
- Does not work with sharded collections.

Ex: 如果需要实现语句1的功能，实现如下：

[javascript]view plaincopy
db.users.group({key: {team: 1}, initial: {members: []}, reduce: function(cur, result){result.members.push(cur.no);}});  

2. db.collection.mapReduce().

据说增加mapreduce是为了迎合mapreduce的流行。

[javascript]view plaincopy
db.collection.mapReduce(  
                         <mapfunction>,  
                        <reducefunction>,  
                         {  
                           out: <collection>,  
                           query: <document>,  
                           sort: <document>,  
                           limit: <number>,  
                           finalize: <function>,  
                           scope: <document>,  
                           jsMode: <boolean>,  
                           verbose: <boolean>  
                         }  
                       )  

特征为：

Implements the MapReduce model for processing large data sets.
Can choose from one of several output options (inline, new collection, merge, replace, reduce)
MapReduce functions are written in JavaScript.
Supports non-sharded and sharded input collections.
Can be used for incremental aggregation over large collections.
MongoDB 2.2 implements much better support for sharded map reduce output.
Current Limitations
- There is a JavaScript lock so a mongod server can only execute one JavaScript function at a point in time .. however, most steps of the MapReduce are very short so locks can be yielded frequently.
- MapReduce functions can be difficult to debug. You can use print() and printjson() to include diagnostic output in the mongod log.
- MapReduce is generally not intuitive for programmers trying to translate relational query aggregation experience.

由于需要用到js engine，所以速度是比较慢的，具体的可以参照http://technicaldebt.com/?p=1157

Ex: 如果需要实现语句1的功能，实现如下：

[javascript]view plaincopy
var map = function(){ emit(this.team, this.no); };   
var reduce = function(key, value){ return {team: key, members: value}; };  
db.users.mapReduce(map, reduce, {out: "team_member"});  

3. db.collection.aggregate().

For simplertasks, mapReduce is big hammer. And avoid overhead of JavaScript engine, alsoselect matching subdocuments and arrays. Aggregate framework is implementedwithpipelinein C++.

Pipeline 定义的操作有:

$match – query predicate as a filter.

$project – use a sample document todetermine the shape of the result.

$unwind – hands out array elements oneat a time.

$group – aggregates items into bucketsdefined by a key.

$sort – sort document.

$limit – allow the specified number ofdocuments to pass

$skip – skip over the specified numberof documents.

特征如下：

New feature in the MongoDB 2.2.0 production release (August, 2012).
Designed with specific goals of improving performance and usability.
Returns result set inline.
Supports non-sharded and sharded input collections.
Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as matching, projecting, sorting, and grouping.
Pipeline operators need not produce one output document for every input document: operators may also generate new documents or filter out documents.
Using projections you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.
Pipeline operators can be repeated as needed (for example, multiple $project or $groupsteps.
Current Limitations
- Results are returned inline, so are limited to the maximum document size supported by the server (16Mb)
- Doesn't support as many output options as MapReduce
- Limited to operators and expressions supported by the Aggregation Framework (i.e. can't write custom functions)
- Newest server feature for aggregation, so has more room to mature in terms of documentation, feature set, and usage.

Ex: 如果需要实现语句1的功能，实现如下：

[javascript]view plaincopy
db.users.aggregate({$project: {team: 1, no: 1}}, {$group: { _id: "$team", memebers: {$addToSet: "$no"}}});  

Refs:

http://docs.mongodb.org/manual/aggregation/#Aggregation-Examples

http://docs.mongodb.org/manual/reference/method/db.collection.group/

http://technicaldebt.com/?p=1157

http://stackoverflow.com/questions/12337319/mongodb-aggregation-comparison-group-group-and-mapreduce

分享到：

F-IDF与余弦相似性的应用（一）：自动提取 ... | MapReduce 编程模型在日志分析方面的应用

2013-09-10 16:45
浏览 1248
评论(0)
分类:数据库
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论