Data Range
通过指定日期的范围来设定分桶规则,如对timestamp字段按照设定的时间段来分桶。
post /kibana_sample_data_flights/_search { "size":0, "aggs":{ "data_range_timestamp":{ "date_range":{ "field":"timestamp", "format":"yyyy-MM", "ranges":[ {"from":"2022-01","to":"2022-02"}, {"from":"2022-02","to":"2022-03"} ] } } } }
返回结果,思考一下如果想要设置固定的key值应该怎么设置呢?还有要注意的是日期格式yyyy-MM-dd HH:mm:ss
"aggregations" : { "data_range_timestamp" : { "buckets" : [ { "key" : "2022-01-2022-02", "from" : 1.6409952E12, "from_as_string" : "2022-01", "to" : 1.6436736E12, "to_as_string" : "2022-02", "doc_count" : 9580 }, { "key" : "2022-02-2022-03", "from" : 1.6436736E12, "from_as_string" : "2022-02", "to" : 1.6460928E12, "to_as_string" : "2022-03", "doc_count" : 1837 } ] } }
Historgram
直方图,以固定间隔的策略来分割数据,如对AvgTicketPrice字段按照100的间隔进行分桶
- interval :每次间隔50
- min_doc_count :存在的文档数最少是0条
- extended_bounds :此值只有当min_doc_count 为0时才具有意义
在实现时你会发现extended_bounds不过滤桶。extended_bounds.min高于从文档中提取的值,那么文档仍然会规定第一个存储段将是什么(对于extended_bounds.max和最后一个存储段也是如此)。为了过滤桶,您应该将直方图聚合嵌套在范围过滤器聚合中,并使用适当的从/到设置
post /kibana_sample_data_flights/_search { "size":0, "aggs":{ "price_histogram":{ "histogram": { "field": "AvgTicketPrice", "interval": 50, "min_doc_count":"0", "extended_bounds":{ "min":0, "max":600 } } } } }
返回结果:
"aggregations" : { "price_histogram" : { "buckets" : [ { "key" : 0.0, "doc_count" : 0 }, { "key" : 50.0, "doc_count" : 0 }, { "key" : 100.0, "doc_count" : 380 }, { "key" : 150.0, "doc_count" : 369 }, { "key" : 200.0, "doc_count" : 398 } ] } }
Data histogram
针对日期的直方图或者柱状图,是时序数据分析中常用的聚合分析类型,如对timestamp字段按照月的间隔进行分桶
post /kibana_sample_data_flights/_search { "size":0, "aggs":{ "timestamp_data_histogram":{ "date_histogram": { "field": "timestamp", "interval": "month", "min_doc_count": 0, "format": "yyyy-MM-dd", "extended_bounds": { "min": "2021-10-10", "max": "2022-01-19" } } } } }
返回结果:
"aggregations" : { "timestamp_data_histogram" : { "buckets" : [ { "key_as_string" : "2021-10-01", "key" : 1633046400000, "doc_count" : 0 }, { "key_as_string" : "2021-11-01", "key" : 1635724800000, "doc_count" : 0 }, { "key_as_string" : "2021-12-01", "key" : 1638316800000, "doc_count" : 1642 }, { "key_as_string" : "2022-01-01", "key" : 1640995200000, "doc_count" : 9580 }, { "key_as_string" : "2022-02-01", "key" : 1643673600000, "doc_count" : 1837 } ] } }
二、嵌套查询
上文中列举了五种分桶的实现,在实际开发中只是单一的进行聚合查询是非常少的,大多情况下都是会进行嵌套操作。
先根据机票进行分桶后,再对分桶后的数据取总数、最小值、最大值、平均值、总和
post /kibana_sample_data_flights/_search { "size":0, "aggs":{ "price_range":{ "range": { "field": "AvgTicketPrice", "ranges": [ {"to":300}, {"from":300,"to":600}, {"from":600} ] }, "aggs":{ "price_status":{ "stats": { "field": "AvgTicketPrice" } } } } } }
返回结果(返回结果截取显示了)
"aggregations" : { "price_range" : { "buckets" : [ { "key" : "*-300.0", "to" : 300.0, "doc_count" : 1816, "price_status" : { "count" : 1816, "min" : 100.0205307006836, "max" : 299.9529113769531, "avg" : 212.5348257619379, "sum" : 385963.2435836792 } } ] } }