速看，ElasticSearch如何处理空值《玩转ElasticSearch 4》-2-阿里云开发者社区

速看，ElasticSearch如何处理空值《玩转ElasticSearch 4》-2

2022-05-21 194

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

检索分析服务 Elasticsearch 版，2核4GB开发者规格 1个月

简介： 速看，ElasticSearch如何处理空值《玩转ElasticSearch 4》

Data Range

通过指定日期的范围来设定分桶规则，如对timestamp字段按照设定的时间段来分桶。

post /kibana_sample_data_flights/_search
{
  "size":0,
  "aggs":{
    "data_range_timestamp":{
      "date_range":{
        "field":"timestamp",
        "format":"yyyy-MM",
        "ranges":[
          {"from":"2022-01","to":"2022-02"},
          {"from":"2022-02","to":"2022-03"}
        ]
      }
    }
  }
}

返回结果，思考一下如果想要设置固定的key值应该怎么设置呢？还有要注意的是日期格式yyyy-MM-dd HH:mm:ss

"aggregations" : {
    "data_range_timestamp" : {
      "buckets" : [
        {
          "key" : "2022-01-2022-02",
          "from" : 1.6409952E12,
          "from_as_string" : "2022-01",
          "to" : 1.6436736E12,
          "to_as_string" : "2022-02",
          "doc_count" : 9580
        },
        {
          "key" : "2022-02-2022-03",
          "from" : 1.6436736E12,
          "from_as_string" : "2022-02",
          "to" : 1.6460928E12,
          "to_as_string" : "2022-03",
          "doc_count" : 1837
        }
      ]
    }
  }

Historgram

直方图，以固定间隔的策略来分割数据，如对AvgTicketPrice字段按照100的间隔进行分桶

interval ：每次间隔50
min_doc_count ：存在的文档数最少是0条
extended_bounds ：此值只有当min_doc_count 为0时才具有意义

在实现时你会发现extended_bounds不过滤桶。extended_bounds.min高于从文档中提取的值，那么文档仍然会规定第一个存储段将是什么（对于extended_bounds.max和最后一个存储段也是如此）。为了过滤桶，您应该将直方图聚合嵌套在范围过滤器聚合中，并使用适当的从/到设置

post /kibana_sample_data_flights/_search
{
  "size":0,
  "aggs":{
    "price_histogram":{
      "histogram": {
        "field": "AvgTicketPrice",
        "interval": 50,
        "min_doc_count":"0",
        "extended_bounds":{
          "min":0,
          "max":600
        }
      }
    }
  }
}

返回结果：

"aggregations" : {
    "price_histogram" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 0
        },
        {
          "key" : 50.0,
          "doc_count" : 0
        },
        {
          "key" : 100.0,
          "doc_count" : 380
        },
        {
          "key" : 150.0,
          "doc_count" : 369
        },
        {
          "key" : 200.0,
          "doc_count" : 398
        }
      ]
    }
  }

Data histogram

针对日期的直方图或者柱状图，是时序数据分析中常用的聚合分析类型，如对timestamp字段按照月的间隔进行分桶

post /kibana_sample_data_flights/_search
{
  "size":0,
  "aggs":{
    "timestamp_data_histogram":{
      "date_histogram": {
        "field": "timestamp",
        "interval": "month",
        "min_doc_count": 0,
        "format": "yyyy-MM-dd",
        "extended_bounds": {
          "min": "2021-10-10",
          "max": "2022-01-19"
        }
      }
    }
  }
}

返回结果：

"aggregations" : {
    "timestamp_data_histogram" : {
      "buckets" : [
        {
          "key_as_string" : "2021-10-01",
          "key" : 1633046400000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2021-11-01",
          "key" : 1635724800000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2021-12-01",
          "key" : 1638316800000,
          "doc_count" : 1642
        },
        {
          "key_as_string" : "2022-01-01",
          "key" : 1640995200000,
          "doc_count" : 9580
        },
        {
          "key_as_string" : "2022-02-01",
          "key" : 1643673600000,
          "doc_count" : 1837
        }
      ]
    }
  }

二、嵌套查询

上文中列举了五种分桶的实现，在实际开发中只是单一的进行聚合查询是非常少的，大多情况下都是会进行嵌套操作。

先根据机票进行分桶后，再对分桶后的数据取总数、最小值、最大值、平均值、总和

post /kibana_sample_data_flights/_search
{
  "size":0,
  "aggs":{
    "price_range":{
      "range": {
        "field": "AvgTicketPrice",
        "ranges": [
          {"to":300},
          {"from":300,"to":600},
          {"from":600}
        ]
      },
      "aggs":{
        "price_status":{
          "stats": {
            "field": "AvgTicketPrice"
          }
        }
      }
    }
  }
}

返回结果（返回结果截取显示了）

"aggregations" : {
    "price_range" : {
      "buckets" : [
        {
          "key" : "*-300.0",
          "to" : 300.0,
          "doc_count" : 1816,
          "price_status" : {
            "count" : 1816,
            "min" : 100.0205307006836,
            "max" : 299.9529113769531,
            "avg" : 212.5348257619379,
            "sum" : 385963.2435836792
          }
        }
      ]
    }
  }

速看，ElasticSearch如何处理空值《玩转ElasticSearch 4》-2

Data Range

Data histogram

二、嵌套查询

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

速看，ElasticSearch如何处理空值《玩转ElasticSearch 4》-2

Data Range

Data histogram

二、嵌套查询

热门文章

最新文章

相关课程

相关电子书

相关实验场景