前言
目前正在出一个Es专题系列教程, 篇幅会较多, 喜欢的话,给个关注❤️ ~
承接上文,本节把上节遗留的条件查询操作给大家讲一下~
为了方便学习, 本节中所有示例沿用上节的索引。本文偏实战一些,好了, 废话不多说直接开整吧~
多条件组合查询
bool
es中使用bool来控制多条件查询,bool查询支持以下参数:
must:被查询的数据必须满足当前条件mush_not:被查询的数据必须不满足当前条件should:被查询的数据应该满足当前条件。should查询被用于修正查询结果的评分。需要注意的是,如果组合查询中没有must,那么被查询的数据至少要匹配一条should。如果有must语句,那么就无须匹配should,should将完全用于修正查询结果的评分filter:被查询的数据必须满足当前条件,但是filter操作不涉及查询结果评分。仅用于条件过滤
下面通过一个例子来看下如何使用:
GET class_1/_search { "query": { "bool": { "must": [ {"match": { "name": "apple" }} ], "must_not": [ {"term": { "num": { "value": "5" } }} ], "should": [ {"match": { "name": "k" }} ],"filter": [ {"range": { "num": { "gte": 0, "lte": 10 } }} ] } } } 复制代码
结果返回:
{ "took" : 9, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : 0.752627, "hits" : [ { "_index" : "class_1", "_type" : "_doc", "_id" : "b8fcCoYB090miyjed7YE", "_score" : 0.752627, "_source" : { "name" : "I eat apple so haochi1~", "num" : 1 } }, { "_index" : "class_1", "_type" : "_doc", "_id" : "ccfcCoYB090miyjed7YE", "_score" : 0.752627, "_source" : { "name" : "I eat apple so haochi3~", "num" : 1 } }, { "_index" : "class_1", "_type" : "_doc", "_id" : "cMfcCoYB090miyjed7YE", "_score" : 0.7389809, "_source" : { "name" : "I eat apple so zhen haochi2~", "num" : 1 } } ] } } 复制代码
constant_score
constant_score查询可以通过boost指定一个固定的评分,通常来说,constant_score的作用是代替一个只有filter的bool查询
下面看具体使用:
GET class_1/_search { "query": { "constant_score": { "filter": { "term": { "num": 6 } }, "boost": 1.2 } } } 复制代码
返回:
{ "took" : 7, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 1.2, "hits" : [ { "_index" : "class_1", "_type" : "_doc", "_id" : "h2Fg-4UBECmbBdQA6VLg", "_score" : 1.2, "_source" : { "name" : "b", "num" : 6 } }, { "_index" : "class_1", "_type" : "_doc", "_id" : "1", "_score" : 1.2, "_source" : { "name" : "l", "num" : 6 } } ] } } 复制代码
查询验证 & 分析
验证
es中通过/_validate/query路由来验证查询条件的正确性, 这里要注意是验证查询条件是否准确
示例:
GET class_1/_validate/query?explain { "query": { "bool": { "must": [ {"match": { "name": "apple" }} ] } } } 复制代码
正常返回:
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "valid" : true, "explanations" : [ { "index" : "class_1", "valid" : true, "explanation" : "+name:apple" } ] } 复制代码
将name字段改为 name1再查询:
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "valid" : true, "explanations" : [ { "index" : "class_1", "valid" : true, "explanation" : """+MatchNoDocsQuery("unmapped fields [name1]")""" } ] } 复制代码
可以看到报了异常错误
分析
es中通过/_validate/query?explain路由来进行查询分析
示例:
GET class_1/_validate/query?explain { "query": { "bool": { "must": [ {"match": { "name": "apple so" }} ] } } } 复制代码
返回:
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "valid" : true, "explanations" : [ { "index" : "class_1", "valid" : true, "explanation" : "+(name:apple name:so)" } ] } 复制代码
可以看到"explanation" : "+(name:apple name:so)",查询的短语apple so被进行了分词,分成了name:apple, name: so
排序
默认排序
在前面的几个例子中,我们可以看到它的默认排序是按照_score降序,也就是匹配度高的比较靠前,但是_socre的计算是很占用查询性能的,这个不难理解。
当我们不需要进行_score计算,可以通过filter或constant_score来进行构建查询条件
filter示例:
GET class_1/_search { "query": { "bool": { "filter": [ {"term": { "num": 1 }} ] } } } 复制代码
返回:
{ "took" : 5, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : 0.0, "hits" : [ { "_index" : "class_1", "_type" : "_doc", "_id" : "b8fcCoYB090miyjed7YE", "_score" : 0.0, "_source" : { "name" : "I eat apple so haochi1~", "num" : 1 } }, { "_index" : "class_1", "_type" : "_doc", "_id" : "ccfcCoYB090miyjed7YE", "_score" : 0.0, "_source" : { "name" : "I eat apple so haochi3~", "num" : 1 } }, { "_index" : "class_1", "_type" : "_doc", "_id" : "cMfcCoYB090miyjed7YE", "_score" : 0.0, "_source" : { "name" : "I eat apple so zhen haochi2~", "num" : 1 } } ] } } 复制代码
通过查询结果我们发现score都为0.0了,说明没有进行score计算
constant_score示例:
GET class_1/_search { "query": { "constant_score": { "filter": { "term": { "num": 1 } }, "boost": 1.2 } } } 复制代码
返回:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : 1.2, "hits" : [ { "_index" : "class_1", "_type" : "_doc", "_id" : "b8fcCoYB090miyjed7YE", "_score" : 1.2, "_source" : { "name" : "I eat apple so haochi1~", "num" : 1 } }, { "_index" : "class_1", "_type" : "_doc", "_id" : "ccfcCoYB090miyjed7YE", "_score" : 1.2, "_source" : { "name" : "I eat apple so haochi3~", "num" : 1 } }, { "_index" : "class_1", "_type" : "_doc", "_id" : "cMfcCoYB090miyjed7YE", "_score" : 1.2, "_source" : { "name" : "I eat apple so zhen haochi2~", "num" : 1 } } ] } } 复制代码
可以看到,对应返回的分值,都是使用boost属性指定的分值
自定义排序
自定义可以用于大部分场景,那么es中怎么进行自定义排序呢? es中使用sort参数来自定义排序顺序,默认为升序,那么降序怎么操作呢?
- 升序
{"sort":["num"]} 复制代码
- 降序,
desc代表降序
{"sort":[{"num":{"order":"desc"}}]} 复制代码
tips
es中使用doc value列式存储来实现字段的排序功能text字段默认不创建doc value,因此无法针对text字段进行排序- 可以通过设置
text字段属性fielddata=true来开启对text字段的排序功能,但是不建议开启,对text字段排序及其消耗查询性能且不符合需求
单字段排序
GET class_1/_search { "sort": [ "num" ] } 复制代码
返回:
{ "took" : 6, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 11, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "class_1", "_type" : "_doc", "_id" : "b8fcCoYB090miyjed7YE", "_score" : null, "_source" : { "name" : "I eat apple so haochi1~", "num" : 1 }, "sort" : [ 1 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "ccfcCoYB090miyjed7YE", "_score" : null, "_source" : { "name" : "I eat apple so haochi3~", "num" : 1 }, "sort" : [ 1 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "cMfcCoYB090miyjed7YE", "_score" : null, "_source" : { "name" : "I eat apple so zhen haochi2~", "num" : 1 }, "sort" : [ 1 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "h2Fg-4UBECmbBdQA6VLg", "_score" : null, "_source" : { "name" : "b", "num" : 6 }, "sort" : [ 6 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "1", "_score" : null, "_source" : { "name" : "l", "num" : 6 }, "sort" : [ 6 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "3", "_score" : null, "_source" : { "num" : 9, "name" : "e", "age" : 9, "desc" : [ "hhhh" ] }, "sort" : [ 9 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "4", "_score" : null, "_source" : { "name" : "f", "age" : 10, "num" : 10 }, "sort" : [ 10 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "RWlfBIUBDuA8yW5cu9wu", "_score" : null, "_source" : { "name" : "一年级", "num" : 20 }, "sort" : [ 20 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "iGFt-4UBECmbBdQAnVJe", "_score" : null, "_source" : { "name" : "g", "age" : 8 }, "sort" : [ 9223372036854775807 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "iWFt-4UBECmbBdQAnVJg", "_score" : null, "_source" : { "name" : "h", "age" : 9 }, "sort" : [ 9223372036854775807 ] } ] } } 复制代码
可以看到是按照num默认升序排序
再看下降序:
GET class_1/_search { "sort": [ {"num": {"order":"desc"}} ] } 复制代码
返回:
{ "took" : 15, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 11, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "class_1", "_type" : "_doc", "_id" : "RWlfBIUBDuA8yW5cu9wu", "_score" : null, "_source" : { "name" : "一年级", "num" : 20 }, "sort" : [ 20 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "4", "_score" : null, "_source" : { "name" : "f", "age" : 10, "num" : 10 }, "sort" : [ 10 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "3", "_score" : null, "_source" : { "num" : 9, "name" : "e", "age" : 9, "desc" : [ "hhhh" ] }, "sort" : [ 9 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "h2Fg-4UBECmbBdQA6VLg", "_score" : null, "_source" : { "name" : "b", "num" : 6 }, "sort" : [ 6 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "1", "_score" : null, "_source" : { "name" : "l", "num" : 6 }, "sort" : [ 6 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "b8fcCoYB090miyjed7YE", "_score" : null, "_source" : { "name" : "I eat apple so haochi1~", "num" : 1 }, "sort" : [ 1 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "ccfcCoYB090miyjed7YE", "_score" : null, "_source" : { "name" : "I eat apple so haochi3~", "num" : 1 }, "sort" : [ 1 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "cMfcCoYB090miyjed7YE", "_score" : null, "_source" : { "name" : "I eat apple so zhen haochi2~", "num" : 1 }, "sort" : [ 1 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "iGFt-4UBECmbBdQAnVJe", "_score" : null, "_source" : { "name" : "g", "age" : 8 }, "sort" : [ -9223372036854775808 ] }, { "_index" : "class_1", "_type" : "_doc", "_id" : "iWFt-4UBECmbBdQAnVJg", "_score" : null, "_source" : { "name" : "h", "age" : 9 }, "sort" : [ -9223372036854775808 ] } ] } } 复制代码
这下就降序排序了
多字段
GET class_1/_search { "sort": [ "num", "age" ] } 复制代码
scroll分页
还记得之前给大家讲的from+size的分页方式吗,es中默认允许from+size的分页的最大数据量为10000。当我们想要批量获取更大的数据量时,使用from+size就会十分的耗费性能。
然而大部分应用场景下的数据量是极其庞大的,比如你要查询某些系统日志数据。es中可以使用/scorll路由来进行滚动分页查询,它类似于在查询初始时间点创建了一个当前服务集群的数据快照(包含每一个分片),并保留它一段时间。在时间超过了设置的过期时间以后,快照将在es空闲时被删除。
需要注意的是,因为是进行快照查询,因此在快照创建后数据的变更在本次的滚动查询中,不可见
初始化快照 & 快照保存10分钟
查询示例:
GET class_1/_search?scroll=10m { "query": { "match_phrase": { "name": "apple" } }, "size": 2 } 复制代码
返回:
{ "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==", "took" : 6, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : 0.752627, "hits" : [ { "_index" : "class_1", "_type" : "_doc", "_id" : "b8fcCoYB090miyjed7YE", "_score" : 0.752627, "_source" : { "name" : "I eat apple so haochi1~", "num" : 1 } }, { "_index" : "class_1", "_type" : "_doc", "_id" : "ccfcCoYB090miyjed7YE", "_score" : 0.752627, "_source" : { "name" : "I eat apple so haochi3~", "num" : 1 } } ] } } 复制代码
如图,当前共返回2条数据,并且返回了一个快照ID,后续可以根据快照ID进行滚动查询:
根据快照ID滚动查询
GET /_search/scroll { "scroll": "10m", "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==" } 复制代码
返回:
{ "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==", "took" : 6, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : 0.752627, "hits" : [ { "_index" : "class_1", "_type" : "_doc", "_id" : "cMfcCoYB090miyjed7YE", "_score" : 0.7389809, "_source" : { "name" : "I eat apple so zhen haochi2~", "num" : 1 } } ] } } 复制代码
在滚动一次:
{ "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==", "took" : 1, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : 0.752627, "hits" : [ ] } } 复制代码
有的小伙伴可能不知道怎么滚动的,因为后续滚动都是同一个scroll_id,其实通过结果,我们不难发现:
- 首先创建了一个10分钟的
快照,规定了每次返回的数据量为2条,并且初始化的时候,返回了2条 - 通过
scroll_id进行滚动操作,返回了1条数据,原因是快照的数据量总共只有3条,初始化的时候返回了2条,所以现在只有1条 - 再次滚动的时候,发现返回了空,因为数据已经被查完了
结束语
本节就到此结束了,大家一定要多去练习。下节我们进入进阶查询部分内容 ~