对于SQL而言,如果从users表里查询每个team所有成员的number,查询语句如下:
- SELECT team, no FROM users GROUP BY team (1)
但是对于Mongodb而言,实现这样的功能,则比较复杂。
从mongodb2.2之后,有了三个function可以实现这个功能,他们按照产生的顺序,分别是group(), mapReduce()和aggregate().
他们之间的区别有哪些呢?参照stack overflow上讨论http://stackoverflow.com/questions/12337319/mongodb-aggregation-comparison-group-group-and-mapreduce整理如下:
1. db.collection.group().
定义为:
- Db.collection.group(
- key,
- reduce,
- initial,
- keyf,
- cond,
- finalize).
特征为:
- Simple syntax and functionality for grouping .. analogous to GROUP BY in SQL.
- Returns result set inline (as an array of grouped items).
- Implemented using the JavaScript engine; custom reduce() functions can be written in JavaScript.
-
Current Limitations
- Will not group into a result set with more than 10,000 keys.
- Results must fit within the limitations of a BSON document (currently 16Mb).
- Takes a read lock and does not allow any other threads to execute JavaScript while it is running.
- Does not work with sharded collections.
Ex: 如果需要实现语句1的功能,实现如下:
- db.users.group({key: {team: 1}, initial: {members: []}, reduce: function(cur, result){result.members.push(cur.no);}});
2. db.collection.mapReduce().
据说增加mapreduce是为了迎合mapreduce的流行。
- db.collection.mapReduce(
- <mapfunction>,
- <reducefunction>,
- {
- out: <collection>,
- query: <document>,
- sort: <document>,
- limit: <number>,
- finalize: <function>,
- scope: <document>,
- jsMode: <boolean>,
- verbose: <boolean>
- }
- )
特征为:
- Implements the MapReduce model for processing large data sets.
- Can choose from one of several output options (inline, new collection, merge, replace, reduce)
- MapReduce functions are written in JavaScript.
- Supports non-sharded and sharded input collections.
- Can be used for incremental aggregation over large collections.
- MongoDB 2.2 implements much better support for sharded map reduce output.
-
Current Limitations
- There is a JavaScript lock so a mongod server can only execute one JavaScript function at a point in time .. however, most steps of the MapReduce are very short so locks can be yielded frequently.
- MapReduce functions can be difficult to debug. You can use print() and printjson() to include diagnostic output in the mongod log.
- MapReduce is generally not intuitive for programmers trying to translate relational query aggregation experience.
由于需要用到js engine,所以速度是比较慢的,具体的可以参照http://technicaldebt.com/?p=1157
Ex: 如果需要实现语句1的功能,实现如下:
- var map = function(){ emit(this.team, this.no); };
- var reduce = function(key, value){ return {team: key, members: value}; };
- db.users.mapReduce(map, reduce, {out: "team_member"});
3. db.collection.aggregate().
For simplertasks, mapReduce is big hammer. And avoid overhead of JavaScript engine, alsoselect matching subdocuments and arrays. Aggregate framework is implementedwithpipelinein C++.
Pipeline 定义的操作有:
$match – query predicate as a filter.
$project – use a sample document todetermine the shape of the result.
$unwind – hands out array elements oneat a time.
$group – aggregates items into bucketsdefined by a key.
$sort – sort document.
$limit – allow the specified number ofdocuments to pass
$skip – skip over the specified numberof documents.
特征如下:
- New feature in the MongoDB 2.2.0 production release (August, 2012).
- Designed with specific goals of improving performance and usability.
- Returns result set inline.
- Supports non-sharded and sharded input collections.
- Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as matching, projecting, sorting, and grouping.
- Pipeline operators need not produce one output document for every input document: operators may also generate new documents or filter out documents.
- Using projections you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.
- Pipeline operators can be repeated as needed (for example, multiple $project or $groupsteps.
-
Current Limitations
- Results are returned inline, so are limited to the maximum document size supported by the server (16Mb)
- Doesn't support as many output options as MapReduce
- Limited to operators and expressions supported by the Aggregation Framework (i.e. can't write custom functions)
- Newest server feature for aggregation, so has more room to mature in terms of documentation, feature set, and usage.
Ex: 如果需要实现语句1的功能,实现如下:
- db.users.aggregate({$project: {team: 1, no: 1}}, {$group: { _id: "$team", memebers: {$addToSet: "$no"}}});
Refs:
http://docs.mongodb.org/manual/aggregation/#Aggregation-Examples
http://docs.mongodb.org/manual/reference/method/db.collection.group/
http://technicaldebt.com/?p=1157
http://stackoverflow.com/questions/12337319/mongodb-aggregation-comparison-group-group-and-mapreduce
相关推荐
mongodb group aggregate项目实战笔记 管道聚合 mongodb group按时间分组,用aggregate管道聚合 会比group的处理效率要高而且更灵活方便
1.分篇章进行学习,内容控制30分钟内 2.1个月疗程,不要放弃治疗哦 3.图文并茂,有问题请发到邮箱
MongoDB MapReduce MapReduce是一种计算模型,简单的说就是将大批量的工作(数据)分解(MAP)执行,然后再将结果合并成最终结果(REDUCE)。这样做的好处是可以在任务被分解后,可以通过大量机器进行并行计算,减少...
MapReduce应该算是MongoDB操作中比较复杂的了,下面这篇文章主要给大家介绍了关于MongoDB中MapReduce使用的相关资料,文中通过示例代码介绍的非常详细,需要的朋友可以参考借鉴,下面随着小编来一起看看吧。
主要介绍了使用aggregate在MongoDB中查询重复数据记录的方法的相关资料,需要的朋友可以参考下
MongoDB的MapReduce.pdf 学习资料 复习资料 教学资源
#资源达人分享计划#
Mongodb是针对大数据量环境下诞生的用于保存大数据量的非关系型数据库,针对大量的数据。接下来通过本文给大家介绍Mongodb中MapReduce实现数据聚合方法详解,感兴趣的朋友一起学习吧
一、MongoDB 聚合管道(Aggregation Pipeline) 二、MongoDB Aggregation 管道操作符与表达式 三、 数据模拟
MongoDB中聚合(aggregate)主要用于处理数据(诸如统计平均值,求和等),并返回计算后的数据结果。有点类似sql语句中的 count(*)。 aggregate() 方法 MongoDB中聚合的方法使用aggregate()。 语法 aggregate() 方法的...
主要介绍了浅析mongodb中group分组的实现方法及示例,非常的简单实用,有需要的小伙伴可以参考下。
主要介绍了MongoDB教程之聚合,MongoDB除了基本的查询功能之外,还提供了强大的聚合功能,这里主要介绍count、distinct和group,需要的朋友可以参考下
计算机后端-PHP视频教程. mongodb10 MapReduce 统计栏目下的商品.wmv
Spring Data MongoDB中文文档 便宜下载了。
作为一个优秀的编程模型,MapReduce在大数据处理中有很大的优势,而mongodb也支持这一编程模型,本文通过简单的单词计数示例论述在mongodb中如何使用MapReduce