概述
继续跟中华石杉老师学习ES,第34篇
课程地址: https://www.roncoo.com/view/55
案例
原始数据:
需求
在白话Elasticsearch33-深入聚合数据分析之案例实战Terms Aggregation + Avg Aggregation ( bucket + metrics ) 中,我们演示了 对颜色进行bucket操作以后,再计算每种颜色的平均价格的metrics操作。
假设 又来了个新需求: 从颜色到品牌进行下钻分析,每种颜色的平均价格,以及找到每种颜色每个品牌的平均价格
那就需要进行多层次的下钻分析了
举个例子:比如说,我们现在的索引中红色的电视有4台,同时这4台电视中,有3台是属于长虹的,1台是属于小米的
那如何计算出 : 红色电视中的3台长虹的平均价格是多少? 红色电视中的1台小米的平均价格是多少?
解决
Step1.对每种颜色进行bucket分组
GET /tvs/sales/_search { "size": 0 , "aggs": { "group_by_color": { "terms": { "field": "color" } } } }
返回
Step2.对每种颜色进行bucket分组 , 然后对每个分组再次计算平均价格
GET /tvs/sales/_search { "size": 0, "aggs": { "group_by_color": { "terms": { "field": "color" }, "aggs": { "color_avg_price": { "avg": { "field": "price" } } } } } }
返回:
Step3.对每种颜色进行bucket分组 , 然后对每个分组再次计算平均价格 , 紧接再对每种颜色按照brand分组,直接写到和 color_avg_price 并列的地方就可以了
GET /tvs/sales/_search { "size": 0, "aggs": { "group_by_color": { "terms": { "field": "color" }, "aggs": { "color_avg_price": { "avg": { "field": "price" } }, "group_by_brand": { "terms": { "field": "brand" } } } } } }
返回
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 8, "max_score": 0, "hits": [] }, "aggregations": { "group_by_color": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "红色", "doc_count": 4, "color_avg_price": { "value": 3250 }, "group_by_brand": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "长虹", "doc_count": 3 }, { "key": "三星", "doc_count": 1 } ] } }, { "key": "绿色", "doc_count": 2, "color_avg_price": { "value": 2100 }, "group_by_brand": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "TCL", "doc_count": 1 }, { "key": "小米", "doc_count": 1 } ] } }, { "key": "蓝色", "doc_count": 2, "color_avg_price": { "value": 2000 }, "group_by_brand": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "TCL", "doc_count": 1 }, { "key": "小米", "doc_count": 1 } ] } } ] } } }
Step4.对每种颜色进行bucket分组 , 然后对每个分组再次计算平均价格 , 紧接再对每种颜色按照brand分组,直接写到和 color_avg_price 并列的地方就可以了。 最后对品牌进行metrics操作,即求每种品牌的平均价格,所以要在 brand 这个bucket中,再次aggs
GET /tvs/sales/_search { "size": 0 , "aggs": { "group_by_color": { "terms": { "field": "color" }, "aggs": { "color_avg_price": { "avg": { "field": "price" } }, "group_by_brand":{ "terms": { "field": "brand" }, "aggs": { "brand_avg_price": { "avg": { "field": "price" } } } } } } } }
到这里,就一步步完成了该需求,来看下返回结果吧
返回:
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 8, "max_score": 0, "hits": [] }, "aggregations": { "group_by_color": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "红色", "doc_count": 4, "color_avg_price": { "value": 3250 }, "group_by_brand": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "长虹", "doc_count": 3, "brand_avg_price": { "value": 1666.6666666666667 } }, { "key": "三星", "doc_count": 1, "brand_avg_price": { "value": 8000 } } ] } }, { "key": "绿色", "doc_count": 2, "color_avg_price": { "value": 2100 }, "group_by_brand": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "TCL", "doc_count": 1, "brand_avg_price": { "value": 1200 } }, { "key": "小米", "doc_count": 1, "brand_avg_price": { "value": 3000 } } ] } }, { "key": "蓝色", "doc_count": 2, "color_avg_price": { "value": 2000 }, "group_by_brand": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "TCL", "doc_count": 1, "brand_avg_price": { "value": 1500 } }, { "key": "小米", "doc_count": 1, "brand_avg_price": { "value": 2500 } } ] } } ] } } }
校验下
原始数据:
我们通过ES算出来的数据:
对比下原始数据,符合预期,至此实现了该需求的DSL 。