《Elastic Stack 实战手册》——三、产品能力——3.4.入门篇——3.4.2.Elasticsearch基础应用——3.4.2.17.Text analysis, settings 及 mappings——3.4.2.17.3.全文搜索/精确搜索(10) https://developer.aliyun.com/article/1229931
4.3 match_phrase
match_phrase 执行的是短语查询,与简单的 match 查询不同的是,match_phrase 在经过 analyzer 解析后保持了词项的匹配顺序。因此实现了短语被完整匹配的场景。
比如,关于 "foxes brown" 的查询:
先测试 match 查询。
GET my-index-000001/_search { "query": { "match": { "full_text": "foxes brown" } } } # 返回结果 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 0.26706278, "hits" : [ { "_index" : "my-index-000001", "_type" : "_doc", "_id" : "1", "_score" : 0.26706278, "_source" : { "full_text" : "quick brown foxes!" } }, { "_index" : "my-index-000001", "_type" : "_doc", "_id" : "2", "_score" : 0.26706278, "_source" : { "full_text" : "Quick Foxes Brown !" } } ] } }
两个结果都返回了,查询出的结果与词项之间的排序并没有关联。再使用 match_phrase 查询。
GET my-index-000001/_search { "query": { "match_phrase": { "full_text": "foxes brown" } } } # 返回结果 { "took" : 11, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 0.26706278, "hits" : [ { "_index" : "my-index-000001", "_type" : "_doc", "_id" : "2", "_score" : 0.26706278, "_source" : { "full_text" : "Quick Foxes Brown !" } } ] } }
只有文档 2 返回了,返回文档的词项顺序与被查询的文档词项顺序一致。
slop 参数
match_phrase 查询为什么能将查询的内容保持原来的顺序呢?
这主要依靠 slop 参数。 slop 参数默认值为0, slop 主要设置词项间的匹配距离,即两个词项紧临时 slop 为 0(词项的 position 之差为 1)。
查询有这 2 个条件:
1、按照词项顺序查询文档。
2、检查被匹配到的文档中词项之间的距离总和是否在 slop 参数设置的匹配距离内。
比如下面几个文档:
○ this is test
○ this is a test
○ this is not a test
○ this a is not a test
○ this a or is not a test
我们确认下词项间的距离,可以通过 _analyze API 查看 position 来计算。为了简洁,我们看下第一个和最后一个文档。
GET _analyze { "text": [ "this is test" ] } # 返回结果 { "tokens" : [ { "token" : "this", "start_offset" : 0, "end_offset" : 4, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "is", "start_offset" : 5, "end_offset" : 7, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "test", "start_offset" : 8, "end_offset" : 12, "type" : "<ALPHANUM>", "position" : 2 } ] } GET _analyze { "text": [ "this a or is not a test" ] } # 返回结果 { "tokens" : [ { "token" : "this", "start_offset" : 0, "end_offset" : 4, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "a", "start_offset" : 5, "end_offset" : 6, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "or", "start_offset" : 8, "end_offset" : 10, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "is", "start_offset" : 11, "end_offset" : 13, "type" : "<ALPHANUM>", "position" : 3 }, { "token" : "not", "start_offset" : 15, "end_offset" : 18, "type" : "<ALPHANUM>", "position" : 4 }, { "token" : "a", "start_offset" : 19, "end_offset" : 20, "type" : "<ALPHANUM>", "position" : 5 }, { "token" : "test", "start_offset" : 21, "end_offset" : 25, "type" : "<ALPHANUM>", "position" : 6 } ] }
可以看到,position记录着单词项的位置顺序。
《Elastic Stack 实战手册》——三、产品能力——3.4.入门篇——3.4.2.Elasticsearch基础应用——3.4.2.17.Text analysis, settings 及 mappings——3.4.2.17.3.全文搜索/精确搜索(12) https://developer.aliyun.com/article/1229927