6、聚合操作

聚合操作处理数据记录并返回计算结果(诸如统计平均值，求和等)。聚合操作组值来自多个文档，可以对分组数据执行各种操作以返回单个结果。聚合操作包含三类：单一作用聚合、聚合管道、MapReduce。

单一作用聚合：提供了对常见聚合过程的简单访问，操作都从单个集合聚合文档。
聚合管道是一个数据聚合的框架，模型基于数据处理流水线的概念。文档进入多级管道，将文档转换为聚合结果。
MapReduce操作具有两个阶段：处理每个文档并向每个输入文档发射一个或多个对象的map阶段，以及reduce组合map操作的输出阶段。

6.1、单一作用集合

MongoDB提供 db.collection.estimatedDocumentCount(), db.collection.count(), db.collection.distinct() 这类单一作用的聚合函数。所有这些操作都聚合来自单个集合的文档。虽然这些操作提供了对公共聚合过程的简单访问，但它们缺乏聚合管道和map-Reduce的灵活性和功能

函数	描述
db.collection.estimatedDocumentCount()	忽略查询条件，返回集合或视图中所有文档的计数
db.collection.count()	返回与find()集合或视图的查询匹配的文档计数。等同于 db.collection.find(query).count()构造
db.collection.distinct()	在单个集合或视图中查找指定字段的不同值，并在数组中返回结果。

#检索books集合中所有文档的计数
db.books.estimatedDocumentCount()
#计算与查询匹配的所有文档
db.books.count({favCount:{$gt:50}})
#返回不同type的数组
db.books.distinct("type")
#返回收藏数大于90的文档不同type的数组
db.books.distinct("type",{favCount:{$gt:90}})

注意：在分片群集上，如果存在孤立文档或正在进行块迁移，则db.collection.count()没有查询谓词可能导致计数不准确。要避免这些情况，请在分片群集上使用 db.collection.aggregate()方法

6.2、聚合管道

什么是 MongoDB 聚合框架

MongoDB 聚合框架（Aggregation Framework）是一个计算框架，它可以：

作用在一个或几个集合上；
对集合中的数据进行的一系列运算；
将这些数据转化为期望的形式；

从效果而言，聚合框架相当于 SQL 查询中的GROUP BY、 LEFT OUTER JOIN 、 AS等。

管道（Pipeline）和阶段（Stage）

整个聚合运算过程称为管道（Pipeline），它是由多个阶段（Stage）组成的，每个管道：

接受一系列文档（原始数据）；
每个阶段对这些文档进行一系列运算；
结果文档输出给下一个阶段；

聚合管道操作语法

pipeline = [$stage1, $stage2, ...$stageN];
db.collection.aggregate(pipeline, {options})

pipelines 一组数据聚合阶段。除o u t 、 out、out、Merge和$geonear阶段之外，每个阶段都可以在管道中出现多次。
options 可选，聚合操作的其他参数。包含：查询计划、是否使用临时文件、游标、最大操作时间、读写策略、强制索引等等

常用的管道聚合阶段

阶段	描述	SQL等价运算符
$match	筛选条件	WHERE
$project	投影	AS
$lookup	左外连接	LEFT OUTER JOIN
$sort	排序	ORDER BY
$group	分组	GROUP BY
$group	分页
$unwind	展开数组
$graphLookup	图搜索
f a c e t / facet/facet/bucket	分面搜索

文档：[[Aggregation Pipeline Stages)](Aggregation Pipeline Stages — MongoDB Manual)

数据准备

准备数据集，执行脚本

vim book2.js进行编写，然后利用load("book2.js")进行加载

var tags = ["nosql","mongodb","document","developer","popular"];
var types = ["technology","sociality","travel","novel","literature"];
var books=[];
for(var i=0;i<50;i++){
  var typeIdx = Math.floor(Math.random()*types.length);
  var tagIdx = Math.floor(Math.random()*tags.length);
  var tagIdx2 = Math.floor(Math.random()*tags.length);
  var favCount = Math.floor(Math.random()*100);
  var username = "xx00"+Math.floor(Math.random()*10);
  var age = 20 + Math.floor(Math.random()*15);
  var book = {
    title: "book-"+i,
    type: types[typeIdx],
    tag: [tags[tagIdx],tags[tagIdx2]],
    favCount: favCount,
    author: {name:username,age:age}
  };
  books.push(book)
}
db.books2.insertMany(books);

$project

投影操作，将原始字段投影成指定名称，如将集合中的 title 投影成 name

db.books2.aggregate([{$project:{name:"$title"}}])

$project 可以灵活控制输出文档的格式，也可以剔除不需要的字段

db.books2.aggregate([{$project:{name:"$title",_id:0,type:1,author:1}}])

从嵌套文档中排除字段(author里面又是一个新的文档)

db.books2.aggregate([
{$project:{name:"$title",_id:0,type:1,"author.name":1}}
])
或者
db.books2.aggregate([
{$project:{name:"$title",_id:0,type:1,author:{name:1}}}
])

$match

m a t c h 用于对文档进行筛选，之后可以在得到的文档子集上做聚合， match用于对文档进行筛选，之后可以在得到的文档子集上做聚合，match用于对文档进行筛选，之后可以在得到的文档子集上做聚合，match可以使用除了地理空间之外的所有常规查询操作符，在实际应用中尽可能将m a t c h 放在管道的前面位置。这样有两个好处：一是可以快速将不需要的文档过滤掉，以减少管道的工作量；二是如果再投射和分组之前执行 match放在管道的前面位置。这样有两个好处：一是可以快速将不需要的文档过滤掉，以减少管道的工作量；二是如果再投射和分组之前执行match放在管道的前面位置。这样有两个好处：一是可以快速将不需要的文档过滤掉，以减少管道的工作量；二是如果再投射和分组之前执行match，查询可以使用索引。

db.books2.aggregate([{$match:{type:"technology"}}])

筛选管道操作和其他管道操作配合时候时，尽量放到开始阶段，这样可以减少后续管道操作符要操作的文档数，提升效率

$count

计数并返回与查询匹配的结果数

db.books2.aggregate([ 
  {$match:{type:"technology"}}, {$count: "type_count"}
])

$match阶段筛选出type匹配technology的文档，并传到下一阶段；

$count阶段返回聚合管道中剩余文档的计数，并将该值分配给type_count

$group

按指定的表达式对文档进行分组，并将每个不同分组的文档输出到下一个阶段。输出文档包含一个_id字段，该字段按键包含不同的组。输出文档还可以包含计算字段，该字段保存由$group的_id字段分组的一些accumulator表达式的值。 $group不会输出具体的文档而只是统计信息。

{ $group: { _id: <expression>, <field1>: { <accumulator1> : <expression1> }, ...
} }

d字段是必填的;但是，可以指定id值为null来为整个输入文档计算累计值。
剩余的计算字段是可选的，并使用运算符进行计算。
_id和表达式可以接受任何有效的表达式。

accumulator操作符

名称	描述	类比sql
$avg	计算均值	avg
$first	返回每组第一个文档，如果有排序，按照排序，如果没有按照默认的存储的顺序的第一个文档。	limit 0 ,1
$last	返回每组最后一个文档，如果有排序，按照排序，如果没有按照默认的存储的顺序的最后个文档。
$max	根据分组，获取集合中所有文档对应值得最大值	max
$min	根据分组，获取集合中所有文档对应值得最小值	min
$push	将指定的表达式的值添加到一个数组中
$addToSet	将表达式的值添加到一个集合中（无重复值，无序）
$sum	计算总和	sum
$stdDevPop	返回输入值的总体标准偏差（population standard deviation）
$stdDevSamp	返回输入值的样本标准偏差（the sample standard deviation）

g r o u p 阶段的内存限制为 100 M 。默认情况下，如果 s t a g e 超过此限制， group阶段的内存限制为100M。默认情况下，如果stage超过此限制，group阶段的内存限制为100M。默认情况下，如果stage超过此限制，group将产生错误。但是，要允许处理大型数据集，请将allowDiskUse选项设置为true以启用$group操作以写入临时文件。

book的总数，收藏总数和收藏平均值

db.books2.aggregate([
{$group:{_id:null,count:{$sum:1},pop:{$sum:"$favCount"},avg:
{$avg:"$favCount"}}}
])

统计每个作者的book收藏总数

db.books2.aggregate([
{$group:{_id:"$author.name",pop:{$sum:"$favCount"}}}
])

统计每个作者的每本book的收藏数

db.books2.aggregate([
{$group:{_id:{name:"$author.name",title:"$title"},pop:{$sum:"$favCount"}}}
])

每个作者的book的type合集

db.books2.aggregate([
{$group:{_id:"$author.name",types:{$addToSet:"$type"}}}
])

$unwind

可以将数组拆分为单独的文档

v3.2+支持如下语法：

{
  $unwind:
    {
    #要指定字段路径，在字段名称前加上$符并用引号括起来。
    path: <field path>,
    #可选,一个新字段的名称用于存放元素的数组索引。该名称不能以$开头。
    includeArrayIndex: <string>,
    #可选，default :false，若为true,如果路径为空，缺少或为空数组，则      $unwind输出文档
    preserveNullAndEmptyArrays: <boolean>
} }

姓名为xx006的作者的book的tag数组拆分为多个文档

db.books2.aggregate([
{$match:{"author.name":"xx006"}},
{$unwind:"$tag"}
])

每个作者的book的tag合集

db.books2.aggregate([
{$unwind:"$tag"},
{$group:{_id:"$author.name",types:{$addToSet:"$tag"}}}
])

案列

示例数据

db.books2.insert([
{
"title" : "book-51",
"type" : "technology",
"favCount" : 11,
"tag":[],
"author" : {
"name" : "ll",
"age" : 28
}
},{
"title" : "book-52",
"type" : "technology",
"favCount" : 15,
"author" : {
"name" : "ll",
"age" : 28
}
},{
"title" : "book-53",
"type" : "technology",
"tag" : [
"nosql",
"document"
],
"favCount" : 20,
"author" : {
"name" : "ll",
"age" : 28
}
}])

测试

# 使用includeArrayIndex选项来输出数组元素的数组索引
db.books2.aggregate([
{$match:{"author.name":"ll"}},
{$unwind:{path:"$tag", includeArrayIndex: "arrayIndex"}}
])

发现：只输出tag不为null的数据，并且多了个数组加下标的字段

# 使用preserveNullAndEmptyArrays选项在输出中包含缺少size字段，null或空数组的文档
db.books2.aggregate([
{$match:{"author.name":"ll"}},
{$unwind:{path:"$tag", preserveNullAndEmptyArrays: true}}
])

发现：无论是tag是否为null，全部可以进行输出。

$limit

限制传递到管道中下一阶段的文档数

db.books2.aggregate([
{$limit : 5 }
])

此操作仅返回管道传递给它的前5个文档。 $limit对其传递的文档内容没有影响。

注意：当s o r t 在管道中的 sort在管道中的sort在管道中的limit之前立即出现时，$sort操作只会在过程中维持前n个结果，其中n是指定的限制，而MongoDB只需要将n个项存储在内存中。

$skip

跳过进入stage的指定数量的文档，并将其余文档传递到管道中的下一个阶段

db.books2.aggregate([
{$skip : 5 }
])

$sort

对所有输入文档进行排序，并按排序顺序将它们返回到管道

语法:

{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

要对字段进行排序，请将排序顺序设置为1或-1，以分别指定升序或降序排序，如下例所示：

db.books.aggregate([
{$sort : {favCount:-1,title:1}}
])

$lookup

Mongodb 3.2版本新增，主要用来实现多表关联查询，相当关系型数据库中多表关联查询。每个输入待处理的文档，经过$lookup 阶段的处理，输出的新文档中会包含一个新生成的数组（可根据需要命名新key ）。数组列存放的数据是来自被Join集合的适配文档，如果没有，集合为空（即为[ ])

语法：

db.collection.aggregate([{
    $lookup: {
             from: "<collection to join>",
             localField: "<field from the input documents>",
       foreignField: "<field from the documents of the from collection>",
         as: "<output array field>"
           }
})

属性	作用
from	同一个数据库下等待被Join的集合
localField	源集合中的match值，如果输入的集合中，某文档没有 localField这个Key （Field），在处理的过程中，会默认为此文档含有 localField：null的键值对
foreignField	待Join的集合的match值，如果待Join的集合中，文档没有foreignField值，在处理的过程中，会默认为此文档含有 foreignField：null的键值对
as	为输出文档的新增值命名。如果输入的集合中已存在该值，则会覆盖掉

注意：null = null 此为真

其语法功能类似于下面的伪SQL语句

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (SELECT *
                               FROM <collection to join>
                               WHERE <foreignField>=                                                                    <collection.localField>);

案例

数据准备

db.customer.insert({customerCode:1,name:"customer1",phone:"13112345678",address:
"test1"})
db.customer.insert({customerCode:2,name:"customer2",phone:"13112345679",address:
"test2"})
db.order.insert({orderId:1,orderCode:"order001",customerCode:1,price:200})
db.order.insert({orderId:2,orderCode:"order002",customerCode:2,price:400})
db.orderItem.insert({itemId:1,productName:"apples",qutity:2,orderId:1})
db.orderItem.insert({itemId:2,productName:"oranges",qutity:2,orderId:1})
db.orderItem.insert({itemId:3,productName:"mangoes",qutity:2,orderId:1})
db.orderItem.insert({itemId:4,productName:"apples",qutity:2,orderId:2})
db.orderItem.insert({itemId:5,productName:"oranges",qutity:2,orderId:2})
db.orderItem.insert({itemId:6,productName:"mangoes",qutity:2,orderId:2})

关联查询(customer与order)

db.customer.aggregate([
  {$lookup: {
    from: "order",  #关联的表
    localField: "customerCode", #自己的关联表字段
    foreignField: "customerCode", # 关联的表的表字段
    as: "customerOrder" #别名
     }
  }
])

关联查询(order与orderItem与customer)

db.order.aggregate([
  {$lookup: {
    from: "customer",
    localField: "customerCode",
    foreignField: "customerCode",
    as: "curstomer"
     }
  },
  {$lookup: {
    from: "orderItem",
    localField: "orderId",
    foreignField: "orderId",
    as: "orderItem"
     }
}
])

聚合操作示例1

统计每个分类的book文档数量

db.books.aggregate([
{$group:{_id:"$type",total:{$sum:1}}},
{$sort:{total:-1}}
])

标签的热度排行，标签的热度则按其关联book文档的收藏数（favCount）来计算

db.books.aggregate([
  {$match:{favCount:{$gt:0}}},
  {$unwind:"$tag"},
  {$group:{_id:"$tag",total:        {$sum:"$favCount"}}},
  {$sort:{total:-1}}
])

$match阶段：用于过滤favCount=0的文档。
$unwind阶段：用于将标签数组进行展开，这样一个包含3个标签的文档会被拆解为3个条目。
g r o u p 阶段：对拆解后的文档进行分组计算， group阶段：对拆解后的文档进行分组计算，group阶段：对拆解后的文档进行分组计算，sum："$favCount"表示按favCount字段进行累加。
$sort阶段：接收分组计算的输出，按total得分进行排序

聚合操作示例二

导入邮政编码数据集：https://media.mongodb.org/zips.json

复制下来，放入记事本并重命名为zips.json即可。

使用mongoimport工具导入数据[[Download MongoDB Tools](Download MongoDB Command Line Database Tools | MongoDB)

下载完成后，打开bin目录进行cmd

mongoimport -h 101.34.254.161 -d appdb -u root -p root --authenticationDatabase=admin -c zips --file C:\Users\YLi_Jing\Desktop\zips.json

h,–host ：代表远程连接的数据库地址，默认连接本地Mongo数据库；

-port : 远程端口，默认为27017

-u,–username：代表连接远程数据库的账号，如果设置数据库的认证，需要指定用户账号

-p,–password：代表连接数据库的账号对应的密码

-d,–db：代表连接的数据库；

-c,–collection：代表连接数据库中的集合

-f, --fields：代表导入集合中的字段

–type : 代表导入的文件类型，包括csv和json,tsv文件，默认json格式

–file：导入的文件名称

–headerline：导入csv文件时，指明第一行是列名，不需要导入

数据成功导入。

返回人口超过1000万的州

db.zips.aggregate( [
  { $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
  { $match: { totalPop: { $gt: 10*1000*1000 } } }
] )

这个聚合操作的等价SQL是：

SELECT state, SUM(pop) AS totalPop
FROM zips
GROUP BY state
HAVING totalPop >= (10*1000*1000)

返回各州平均城市人口

db.zips.aggregate( [
  { $group: { _id: { state: "$state", city: "$city" }, cityPop: { $sum: "$pop" }
} },
  { $group: { _id: "$_id.state", avgCityPop: { $avg: "$cityPop" } } },
  { $sort:{avgCityPop:-1}}
] )

按州返回最大和最小的城市

db.zips.aggregate( [
  { $group:
     {
    _id: { state: "$state", city: "$city" },
    pop: { $sum: "$pop" }
     }
  },
  { $sort: { pop: 1 } },
  { $group:
     {
    _id : "$_id.state",
    biggestCity: { $last: "$_id.city" },
    biggestPop: { $last: "$pop" },
    smallestCity: { $first: "$_id.city" },
    smallestPop: { $first: "$pop" }
     }
  },
  { $project:
     { 
    _id: 0,
    state: "$_id",
    biggestCity: { name: "$biggestCity", pop: "$biggestPop" },
    smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
     }
  },
  { $sort: { state: 1 } }
] )

6.3、MapReduce

MapReduce操作将大量的数据处理工作拆分成多个线程并行处理，然后将结果合并在一起。MongoDB 提供的Map-Reduce非常灵活，对于大规模数据分析也相当实用。

MapReduce具有两个阶段：

将具有相同Key的文档数据整合在一起的map阶段
组合map操作的结果进行统计输出的reduce阶段

MapReduce的基本语法：

db.collection.mapReduce(
  function() {emit(key,value);}, //map 函数
  function(key,values) {return reduceFunction}, //reduce 函数
  {
  out: <collection>,
  query: <document>,
  sort: <document>,
  limit: <number>,
  finalize: <function>,
  scope: <document>,
  jsMode: <boolean>,
  verbose: <boolean>,
  bypassDocumentValidation: <boolean>
   }
)

map，将数据拆分成键值对，交给reduce函数
reduce，根据键将值做统计运算
out，可选，将结果汇入指定表
quey，可选筛选数据的条件，筛选的数据送入map
sort，排序完后，送入map
limit，限制送入map的文档数
finalize，可选，修改reduce的结果后进行输出
scope，可选，指定map、reduce、finalize的全局变量
jsMode，可选，默认false。在mapreduce过程中是否将数据转换成bson格式。
verbose，可选，是否在结果中显示时间，默认false
bypassDocmentValidation，可选，是否略过数据校验

统计type为travel的不同作者的book文档收藏数**

db.books.mapReduce(
  function(){emit(this.author.name,this.favCount)},
  function(key,values){return Array.sum(values)},
   {
  query:{type:"travel"},
  out: "books_favCount"
   }
)

6.4、Springboot中整合MongoDB进行聚合操作

支持的操作	java接口	说明
$project	Aggregation.project	修改输入文档的结构
$match	Aggregation.match	过滤数据
$limit	Aggregation.limit	用来限制MongoDB聚合管道返回的文档数
$skip	Aggregation.skip	跳过指定的文档
$unwind	Aggregation.unwind	将文档中的某一个数组类型字段拆分成多条
$group	Aggregation.group	分组，用于统计结果
$sort	Aggregation.sort	将输入文档排序后输出
$geoNear	Aggregation.geoNear	输出接近某一地理位置的有序文档

基于聚合操作Aggregation.group,mongodb提供的可选表达式

聚合表达式	java接口	说明
$sum	Aggregation.group().sum(“field”).as(“sum”)	求和
$avg	Aggregation.group().avg(“field”).as(“avg”)	求平均
$min	Aggregation.group().min(“field”).as(“min”)	最小值
$max	Aggregation.group().max(“field”).as(“max”)	最大值
$push	Aggregation.group().push(“field”).as(“push”)	在结果文档中插入值到一个数组中
$addToSet	Aggregation.group().addToSet(“field”).as(“addToSet”)	在结果文档中插入值到一个数组中，但不创建副本
$first	Aggregation.group().first(“field”).as(“first”)	根据资源文档的排序获取第一个文档数据
$last	Aggregation.group().last(“field”).as(“last”)	根据资源文档的排序获取最后一个文档数据

示例：以聚合管道示例2为例

实体结构

@Document("zips")
@Data
@AllArgsConstructor
@NoArgsConstructor
public class Zips {
    /**
     * 映射文档中的_id
     */
    @Id
    private String id;
    @Field
    private String city;
    @Field
    private Double[] loc;
    private Integer pop;
    @Field
    private String state;
}

返回人口超过1000万的州

db.zips.aggregate( [
  { $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
  { $match: { totalPop: { $gt: 10*1000*1000 } } }
] )

java实现

@Test
void test1(){
    // $group
    GroupOperation group = Aggregation.group("state")
        .sum("pop")
        .as("totalPop");
    // $match
    MatchOperation match = Aggregation
        .match(Criteria.where("totalPop")
               .gt(10*1000*1000));
    // 按顺序组合每一个聚合步骤
    TypedAggregation<Zips> zipsTypedAggregation = Aggregation.newAggregation(Zips.class, group, match);
    // 执行聚合操作
    AggregationResults<Map> aggregate = mongoTemplate.aggregate(zipsTypedAggregation, Map.class);
    // 取出最终结果
    List<Map> mappedResults = aggregate.getMappedResults();
    for (Map mappedResult : mappedResults) {
        System.out.println(mappedResult);
    }
}

返回各州平均城市人口

db.zips.aggregate( [
  { $group: { _id: { state: "$state", city: "$city" }, cityPop: { $sum: "$pop" }
} },
  { $group: { _id: "$_id.state", avgCityPop: { $avg: "$cityPop" } } },
  { $sort:{avgCityPop:-1}}
] )

java实现

@Test
void test2() {
    //$group
    GroupOperation groupOperation = Aggregation.group("state","city")
        .sum("pop").as("cityPop");
    //$group
    GroupOperation groupOperation2 = Aggregation.group("_id.state")
        .avg("cityPop").as("avgCityPop");
    //$sort
    SortOperation sortOperation = Aggregation.sort(Sort.Direction.DESC,"avgCityPop");
    // 按顺序组合每一个聚合步骤
    TypedAggregation<Zips> typedAggregation = Aggregation.newAggregation(Zips.class,
                                                                         groupOperation, groupOperation2,sortOperation);
    //执行聚合操作,如果不使用 Map，也可以使用自定义的实体类来接收数据
    AggregationResults<Map> aggregationResults = mongoTemplate
        .aggregate(typedAggregation, Map.class);
    // 取出最终结果
    List<Map> mappedResults = aggregationResults.getMappedResults();
    for(Map map:mappedResults){
        System.out.println(map);
    }
}

按州返回最大和最小的城市

db.zips.aggregate( [
  { $group:
     {
    _id: { state: "$state", city: "$city" },
    pop: { $sum: "$pop" }
     }
  },
  { $sort: { pop: 1 } },
  { $group:
     {
    _id : "$_id.state",
    biggestCity: { $last: "$_id.city" },
    biggestPop: { $last: "$pop" },
    smallestCity: { $first: "$_id.city" },
    smallestPop: { $first: "$pop" }
     }
  },
  { $project:
     { 
    _id: 0,
    state: "$_id",
    biggestCity: { name: "$biggestCity", pop: "$biggestPop" },
    smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
     }
  },
  { $sort: { state: 1 } }
] )

java实现

@Test
void test3() {
    //$group
    GroupOperation groupOperation = Aggregation.group("state", "city")
        .sum("pop").as("pop");
    //$sort
    SortOperation sortOperation = Aggregation.sort(Sort.Direction.ASC, "pop");
    //$group
    GroupOperation groupOperation2 = Aggregation.group("_id.state")
        .last("_id.city").as("biggestCity")
        .last("pop").as("biggestPop")
        .first("_id.city").as("smallestCity")
        .first("pop").as("smallestPop");
    //$project
    ProjectionOperation projectionOperation = Aggregation
        .project("biggestCity", "smallestCity", "state")
        .andExclude("_id")
        .andExpression(" { name: \"$biggestCity\",  pop: \"$biggestPop\" }")
        .as("biggestCity")
        .andExpression("{ name: \"$smallestCity\", pop: \"$smallestPop\" }")
        .as("smallestCity")
        .and("_id").as("state");
    //$sort
    SortOperation sortOperation2 = Aggregation.sort(Sort.Direction.ASC, "state");
    // 按顺序组合每一个聚合步骤
    TypedAggregation<Zips> typedAggregation = Aggregation.newAggregation(Zips.class,
                                                                         groupOperation, sortOperation, groupOperation2, projectionOperation,
                                                                         sortOperation2);
    //执行聚合操作,如果不使用 Map，也可以使用自定义的实体类来接收数据
    AggregationResults<Map> aggregationResults = mongoTemplate
        .aggregate(typedAggregation, Map.class);
    // 取出最终结果
    List<Map> mappedResults = aggregationResults.getMappedResults();
    for (Map map : mappedResults) {
        System.out.println(map);
    }
}

7、MongoDB索引

7.1、索引介绍

索引是一种用来快速查询数据的数据结构。B+Tree就是一种常用的数据库索引数据结构，MongoDB 采用B+Tree 做索引，索引创建在colletions上。MongoDB不使用索引的查询，先扫描所有的文档，再匹配符合条件的文档。使用索引的查询，通过索引找到文档，使用索引能够极大的提升查询效率

MongoDB索引数据结构

思考：MongoDB索引数据结构是B-Tree还是B+Tree?

B-Tree说法来源于官方文档，然后就导致了分歧：有人说MongoDB索引数据结构使用的是B-Tree,有的人又说是B+Tree。

MongoDB官方文档: https://docs.mongodb.com/manual/indexes/

索引的分类

按照索引包含的字段数量，可以分为单键索引和组合索引（或复合索引）。
按照索引字段的类型，可以分为主键索引和非主键索引。
按照索引节点与物理记录的对应方式来分，可以分为聚簇索引和非聚簇索引，其中聚簇索引是指索引节点上直接包含了数据记录，而后者则仅仅包含一个指向数据记录的指针。
按照索引的特性不同，又可以分为唯一索引、稀疏索引、文本索引、地理空间索引等

与大多数数据库一样，MongoDB支持各种丰富的索引类型，包括单键索引、复合索引，唯一索引等一些常用的结构。由于采用了灵活可变的文档类型，因此它也同样支持对嵌套字段、数组进行索引。通过建立合适的索引，我们可以极大地提升数据的检索速度。在一些特殊应用场景，MongoDB还支持地理空间索引、文本检索索引、TTL索引等不同的特性

7.2、索引操作

创建索引

db.collection.createIndex(keys, options)

Key 值为你要创建的索引字段，1 按升序创建索引， -1 按降序创建索引
可选参数列表如下：

parameter	type	description
background	Boolean	建索引过程会阻塞其它数据库操作，background可指定以后台方式创建索引，即增加 “background” 可选参数。 “background” 默认值为false。
unique	Boolean	建立的索引是否唯一。指定为true创建唯一索引。默认值为false
name	string	索引的名称。如果未指定，MongoDB的通过连接索引的字段名和排序顺序生成一个索引名称
dropDups	Boolean	3.0+版本已废弃。在建立唯一索引时是否删除重复记录,指定 true 创建唯一索引。默认值为 false.
sparse	Boolean	对文档中不存在的字段数据不启用索引；这个参数需要特别注意，如果设置为true的话，在索引字段中不会查询出不包含对应字段的文档.。默认值为 false
expireAfterSeconds	integer	指定一个以秒为单位的数值，完成 TTL设定，设定集合的生存时间
v	index version	索引的版本号。默认的索引版本取决于mongod创建索引时运行的版本
weights	document	索引权重值，数值在 1 到 99,999 之间，表示该索引相对于其他索引字段的得分权重
default_language	string	对于文本索引，该参数决定了停用词及词干和词器的规则的列表。默认为英语
language_override	string	对于文本索引，该参数决定了停用词及词干和词器的规则的列表。默认为英语

注意：3.0.0 版本前创建索引方法为 db.collection.ensureIndex()

# 创建索引后台执行
db.values.createIndex({open: 1, close: 1}, {background: true})
# 创建唯一索引
db.values.createIndex({title:1},{unique:true})

查看索引占用空间

db.collection.totalIndexSize([is_detail])

is_detail：可选参数，传入除0或false外的任意数据，都会显示该集合中每个索引的大小及总大小。如果传入0或false则只显示该集合中所有索引的总大小。默认值为false

删除索引

#删除集合指定索引
db.col.dropIndex("索引名称")
#删除集合所有索引
db.col.dropIndexes()

7.3、索引类型

单键索引（Single Field Indexes）

在某一个特定的字段上建立索引 mongoDB在ID上建立了唯一的单键索引,所以经常会使用id来进行查询；在索引字段上进行精确匹配、排序以及范围查找都会使用此索引

db.books.createIndex({title:1})

对内嵌文档创建索引：

复合索引（Compound Index）

复合索引是多个字段组合而成的索引，其性质和单字段索引类似。但不同的是，复合索引中字段的顺序、字段的升降序对查询性能有直接的影响，因此在设计复合索引时则需要考虑不同的查询场景

db.books.createIndex({type:1,favCount:1})

多键索引（Multikey Index）

在数组的属性上建立索引。针对这个数组的任意值的查询都会定位到这个文档,既多个索引入口或者键值引用同一个文档

准备inventory集合:

db.inventory.insertMany([
{ _id: 5, type: "food", item: "aaa", ratings: [ 5, 8, 9 ] },
{ _id: 6, type: "food", item: "bbb", ratings: [ 5, 9 ] },
{ _id: 7, type: "food", item: "ccc", ratings: [ 9, 5, 8 ] },
{ _id: 8, type: "food", item: "ddd", ratings: [ 9, 5 ] },
{ _id: 9, type: "food", item: "eee", ratings: [ 5, 9, 5 ] }
])

创建多键索引

db.inventory.createIndex( { ratings: 1 } )

多键索引很容易与复合索引产生混淆，复合索引是多个字段的组合，而多键索引则仅仅是在一个字段上出现了多键（multi key）。而实质上，多键索引也可以出现在复合字段上

# 创建复合多键索引
db.inventory.createIndex( { item:1,ratings: 1} )

注意： MongoDB并不支持一个复合索引中同时出现多个数组字段

嵌入文档的索引数组

db.inventory.insertMany([
{
_id: 1,
item: "abc",
stock: [
{ size: "S", color: "red", quantity: 25 },
{ size: "S", color: "blue", quantity: 10 },
{ size: "M", color: "blue", quantity: 50 }
]
},
{
_id: 2,
item: "def",
stock: [
{ size: "S", color: "blue", quantity: 20 },
{ size: "M", color: "blue", quantity: 5 },
{ size: "M", color: "black", quantity: 10 },
{ size: "L", color: "red", quantity: 2 }
]
},
{
_id: 3,
item: "ijk",
stock: [
{ size: "M", color: "blue", quantity: 15 },
{ size: "L", color: "blue", quantity: 100 },
{ size: "L", color: "red", quantity: 25 }
]
}
])

在包含嵌套对象的数组字段上创建多键索引

db.inventory.createIndex( { "stock.size": 1, "stock.quantity": 1 } )

地理空间索引（Geospatial Index）

在移动互联网时代，基于地理位置的检索（LBS）功能几乎是所有应用系统的标配。MongoDB为地理空间检索提供了非常方便的功能。地理空间索引（2dsphereindex）就是专门用于实现位置检索的一种特殊索引。

案例：MongoDB如何实现“查询附近商家"？

假设商家的数据模型如下：

db.restaurant.insert({
  restaurantId: 0,
  restaurantName:"兰州牛肉面",
  location : {
    type: "Point",
    coordinates: [ -73.97, 40.77 ]
    }
})

创建一个2dsphere索引

db.restaurant.createIndex({location : "2dsphere"})

查询附近10000米商家信息

db.restaurant.find( {
  location:{
    $near :{
      $geometry :{
        type : "Point" ,
        coordinates : [ -73.88, 40.78 ]
      } ,
      $maxDistance:10000
    }
  }
} )

全文索引（Text Indexes）

MongoDB支持全文检索功能，可通过建立文本索引来实现简易的分词检索

db.reviews.createIndex( { comments: "text" } )

t e x t 操作符可以在有 t e x t i n d e x 的集合上执行文本检索。 text操作符可以在有text index的集合上执行文本检索。text操作符可以在有textindex的集合上执行文本检索。text将会使用空格和标点符号作为分隔符对检索字符串进行分词，并且对检索字符串中所有的分词结果进行一个逻辑上的 OR 操作。

全文索引能解决快速文本查找的需求，比如有一个博客文章集合，需要根据博客的内容来快速查找，则可以针对博客内容建立文本索引。

案例

数据准备

db.stores.insert(
[
{ _id: 1, name: "Java Hut", description: "Coffee and cakes" },
{ _id: 2, name: "Burger Buns", description: "Gourmet hamburgers" },
{ _id: 3, name: "Coffee Shop", description: "Just coffee" },
{ _id: 4, name: "Clothes Clothes Clothes", description: "Discount clothing"
},
{ _id: 5, name: "Java Shopping", description: "Indonesian goods" }
]
)

创建name和description的全文索引

db.stores.createIndex({name: "text", description: "text"})

测试

通过$text操作符来查寻数据中所有包含“coffee”,”shop”，“java”列表中任何词语的商店

db.stores.find({$text: {$search: "java coffee shop"}})

MongoDB的文本索引功能存在诸多限制，而官方并未提供中文分词的功能，这使得该功能的应用场景十分受限

Hash索引（Hashed Indexes）

不同于传统的B-Tree索引,哈希索引使用hash函数来创建索引。在索引字段上进行精确匹配,但不支持范围查询,不支持多键hash； Hash索引上的入口是均匀分布的,在分片集合中非常有用

db.users. createIndex({username : 'hashed'})

通配符索引（Wildcard Indexes）

MongoDB的文档模式是动态变化的，而通配符索引可以建立在一些不可预知的字段上，以此实现查询的加速。MongoDB 4.2 引入了通配符索引来支持对未知或任意字段的查询

案例

准备商品数据，不同商品属性不一样

db.products.insert([
  {
    "product_name" : "Spy Coat",
    "product_attributes" : {
    "material" : [ "Tweed", "Wool", "Leather" ],
    "size" : {
      "length" : 72,
      "units" : "inches"
      }
    }
  },
  {
    "product_name" : "Spy Pen",
    "product_attributes" : {
      "colors" : [ "Blue", "Black" ],
      "secret_feature" : {
        "name" : "laser",
        "power" : "1000",
        "units" : "watts",
         }
       }
    },
    {
      "product_name" : "Spy Book"
    }
])

创建通配符索引

db.products.createIndex( { "product_attributes.$**" : 1,"product_name":1 } )

测试

通配符索引可以支持任意单字段查询 product_attributes或其嵌入字段：

db.products.find( { "product_attributes.size.length" : { $gt : 60 } } )
db.products.find( { "product_attributes.material" : "Leather" } )
db.products.find( { "product_attributes.secret_feature.name" : "laser" } )

注意事项

通配符索引不兼容的索引类型或属性

通配符索引是稀疏的，不索引空字段。因此，通配符索引不能支持查询字段不存在的文档

# 通配符索引不能支持以下查询
db.products.find( {"product_attributes" : { $exists : false } } )
db.products.aggregate([
{ $match : { "product_attributes" : { $exists : false } } }
])

通配符索引为文档或数组的内容生成条目，而不是文档/数组本身。因此通配符索引不能支持精确的文档/数组相等匹配。通配符索引可以支持查询字段等于空文档{}的情况

#通配符索引不能支持以下查询:
db.products.find({ "product_attributes.colors" : [ "Blue", "Black" ] } )
db.products.aggregate([{
$match : { "product_attributes.colors" : [ "Blue", "Black" ] }
}])

高效NoSQL数据库利器：Mongodb完整教程（三）

6、聚合操作

6.1、单一作用集合

6.2、聚合管道

6.3、MapReduce

6.4、Springboot中整合MongoDB进行聚合操作

7、MongoDB索引

7.1、索引介绍

7.2、索引操作

7.3、索引类型

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

高效NoSQL数据库利器：Mongodb完整教程（三）

6、聚合操作

6.1、单一作用集合

6.2、聚合管道

6.3、MapReduce

6.4、Springboot中整合MongoDB进行聚合操作

7、MongoDB索引

7.1、索引介绍

7.2、索引操作

7.3、索引类型

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像