数据准备
ElasticSearch 实现分词全文检索 - 测试数据准备
聚合查询
ES 的聚合查询和MySQL的聚合查询类型,ES的聚合查询相比MySQL要强大,提供的统计数据的方式多种多样
# ES聚合查询的 Restful 语法 POST /index/type/_search { "aggs":{ "名字(agg)":{ "agg_type":{ "属性":"值" } } } }
去重计数查询 (Cardinality)
去重计数,即 Cardinality,第一步先将返回的文档中的一个指定的field进行去重,统计一共有多少条
#去重计数 POST /sms-logs-index/_search { "aggs": { "agg": { "cardinality": { "field": "province" } } } }
Java
@Test void cardinalityQuery() throws Exception { String indexName = "sms-logs-index"; RestHighLevelClient client = ESClient.getClient(); //1. 创建SearchRequest对象 SearchRequest request = new SearchRequest(indexName); //2. 指定查询条件 SearchSourceBuilder builder = new SearchSourceBuilder(); builder.aggregation(AggregationBuilders.cardinality("agg").field("province")); request.source(builder); //3. 执行查询 SearchResponse resp = client.search(request, RequestOptions.DEFAULT); //4. 输出返回值 Cardinality agg = resp.getAggregations().get("agg"); long value = agg.getValue(); System.out.println(value); }
范围统计 (range)
统计一定范围内出现的文档个数,比如:针对某一个Field的值在 0100,100200,200~300 之间文档出现的个数分别是多少
范围统计可以针对普通的数值,针对时间类型,针对IP类型,都可以做相应的统计。
range,data_range,ip_range
# 数值方式范围统计 POST /sms-logs-index/_search { "aggs": { "agg": { "range": { "field": "fee", "ranges": [ { "to": 20 }, { "from": 20, # from 有包含当前值的意思 "to": 30 }, { "from": 30 } ] } } } } # 数值方式范围统计 POST /sms-logs-index/_search { "aggs": { "agg": { "date_range": { "field": "createDate", "format":"yyyy", "ranges": [ { "to": 2023 # 2023以前的数据量 }, { "from": 2023 # 2023以后的数据量 } ] } } } } # IP方式范围统计 POST /sms-logs-index/_search { "aggs": { "agg": { "ip_range": { "field": "ipAddr", "ranges": [ { "to": "172.16.0.4" }, { "from": "172.16.0.4" } ] } } } }
Java
@Test void rangeQuery() throws Exception { String indexName = "sms-logs-index"; RestHighLevelClient client = ESClient.getClient(); //1. 创建SearchRequest对象 SearchRequest request = new SearchRequest(indexName); //2. 指定查询条件 SearchSourceBuilder builder = new SearchSourceBuilder(); builder.aggregation(AggregationBuilders.range("agg").field("fee") .addUnboundedTo(20) .addRange(20, 30) .addUnboundedFrom(30)); request.source(builder); //3. 执行查询 SearchResponse resp = client.search(request, RequestOptions.DEFAULT); //4. 输出返回值 org.elasticsearch.search.aggregations.bucket.range.Range agg = resp.getAggregations().get("agg"); for (Range.Bucket bucket : agg.getBuckets()) { String key = bucket.getKeyAsString(); Object from = bucket.getFrom(); Object to = bucket.getTo(); long docCount = bucket.getDocCount(); System.out.println(String.format("Key:%s From: %s to: %s DocCount: %s", key, from, to, docCount)); } }
统计聚合查询 (extended_stats)
他可以查询指定Field的最大值,最小值,平均值,平方和...
# 统计聚合查询 POST /sms-logs-index/_search { "aggs": { "agg": { "extended_stats": { "field": "fee" } } } }
返回值
"aggregations" : { "agg" : { "count" : 8, "min" : 17.0, "max" : 45.0, "avg" : 31.25, "sum" : 250.0, "sum_of_squares" : 8468.0, "variance" : 81.9375, "variance_population" : 81.9375, "variance_sampling" : 93.64285714285714, "std_deviation" : 9.051933495115836, "std_deviation_population" : 9.051933495115836, "std_deviation_sampling" : 9.676923950453322, "std_deviation_bounds" : { "upper" : 49.35386699023167, "lower" : 13.146133009768327, "upper_population" : 49.35386699023167, "lower_population" : 13.146133009768327, "upper_sampling" : 50.60384790090664, "lower_sampling" : 11.896152099093356 } } }
Java
@Test void extendedQuery() throws Exception { String indexName = "sms-logs-index"; RestHighLevelClient client = ESClient.getClient(); //1. 创建SearchRequest对象 SearchRequest request = new SearchRequest(indexName); //2. 指定查询条件 SearchSourceBuilder builder = new SearchSourceBuilder(); builder.aggregation(AggregationBuilders.extendedStats("agg").field("fee")); request.source(builder); //3. 执行查询 SearchResponse resp = client.search(request, RequestOptions.DEFAULT); //4. 输出返回值 ExtendedStats agg = resp.getAggregations().get("agg"); double max = agg.getMax(); double min = agg.getMin(); System.out.println(String.format("Max:%s Min: %s ", max, min)); }