一、问题描述:
在mysql数据库中,我们可以很方面的通过having关键字实现对聚合结果的过滤查询。那么,在ES中该如何实现类似having的先聚合再过滤查询呢?
二、业务场景:
需要找出下单次数大于等于2单,并且平均下单金额大于等于100的客户
在关系型数据库中对应的SQL语句:
SELECT userId, AVG(amount) avgAmount, count(*) orderCount FROM order GROUP by userId HAVING avgAmount >= 100 and orderCount >=2
三、数据准备
创建订单索引order_index,并添加测试数据。
## 删除索引 ## DELETE order_index ## 新建索引 PUT order_index { "mappings": { "properties": { "name": { "type": "keyword" }, "amount": { "type": "integer" } } } } ## 添加数据 POST order_index/_bulk?refresh { "create": { } } { "name": "老万", "amount": 100} { "create": { } } { "name": "老万", "amount": 80} { "create": { } } { "name": "老万", "amount": 300} { "create": { } } { "name": "老王", "amount": 45} { "create": { } } { "name": "小明", "amount": 15} { "create": { } } { "name": "小明", "amount": 50} { "create": { } } { "name": "小红", "amount": 300}
四、具体实现
1、SQL实现方式
说明:由于ES6.3以后已经支持sql查询,所有首先尝试大家最熟悉的sql查询方案能否实现。
POST /_sql?format=txt { "query": "SELECT name,AVG(amount) avgAmount,count(*) orderCount FROM order_index group by name having avgAmount >= 100 and orderCount >=2 " }
查询结果:
用户名为老万,满足平均订单金额大于100,且下单数大于2。
查询结果正确。
2、DSL实现方式
GET order_index/_search { "size": 0, "aggs": { "groupName": { "terms": { "field": "name" }, "aggs": { "avgAmount": { "avg": { "field": "amount" } }, "having": { "bucket_selector": { "buckets_path": { "orderCount": "_count", "avgAmount": "avgAmount" }, "script": { "source": "params.avgAmount >= 100 && params.orderCount >=2 " } } } } } } }
查询结果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 7, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "groupUserId" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "老万", "doc_count" : 3, "avgAmount" : { "value" : 160.0 } } ] } } }
sql语句底层实现分析:
POST /_sql/translate { "query": "SELECT name,AVG(amount) avgAmount,count(*) orderCount FROM order_index group by name having avgAmount >= 100 and orderCount >=2 " }
执行结果:
分析sql转化的DSL语句,和上面DSL语句的实现,说明两者底层实现原理一致。
mysql中通过having实现根据聚合结果进行过滤,ES中使用 bucket_selector 来实现此功能。
{ "size" : 0, "_source" : false, "stored_fields" : "_none_", "aggregations" : { "groupby" : { "composite" : { "size" : 1000, "sources" : [ { "7e80e5b2" : { "terms" : { "field" : "name", "missing_bucket" : true, "order" : "asc" } } } ] }, "aggregations" : { "d8415567" : { "avg" : { "field" : "amount" } }, "having.having.d8415567_&_having.b26c7698" : { "bucket_selector" : { "buckets_path" : { "a0" : "d8415567", "a1" : "_count" }, "script" : { "source" : "InternalQlScriptUtils.nullSafeFilter(InternalQlScriptUtils.and(InternalQlScriptUtils.nullSafeFilter(InternalQlScriptUtils.gte(params.a0,params.v0)),InternalQlScriptUtils.nullSafeFilter(InternalQlScriptUtils.gte(params.a1,params.v1))))", "lang" : "painless", "params" : { "v0" : 100, "v1" : 2 } }, "gap_policy" : "skip" } } } } } }
具体实现
本文主要介绍了ES中如何实现类似having的先聚合再过滤查询。
1、介绍了基于sql和dsl的两种实现方式,但是二者的底层原理其实都是一样的。
2、实际项目中,更推荐直接采用sql来实现,代码简单,sql语句相比dsl上手更容易,也更容易理解。
3、mysql中通过having实现根据聚合结果进行过滤,ES中使用 bucket_selector 来实现此功能。