概述
继续跟中华石杉老师学习ES,第18篇
课程地址: https://www.roncoo.com/view/55
接上篇博客 白话Elasticsearch17-match_phrase query 短语匹配搜索
官网
https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html
slop 含义
官网中我们可以看到
A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2.
slop是什么呢?
query string,搜索文本,中的几个term,要经过几次移动才能与一个document匹配,这个移动的次数,就是slop 。
- slop的phrase match,就是proximity match,近似匹配
- 如果我们指定了slop,那么就允许搜索关键词进行移动,来尝试与doc进行匹配
- 搜索关键词k,可以有一定的距离,但是靠的越近,越先搜索出来,proximity match
例子
一个query string经过几次移动之后可以匹配到一个document,然后设置slop .
假设有个doc
hello world, java is very good, spark is also very good.
我们使用 match_phrase query 来搜索 java spark ,是肯定搜索不到的, 因为 match_phrase query 会将java spark 作为一个整体来查找。
如果我们指定了slop,那么就允许java spark进行移动,来尝试与doc进行匹配
这里的slop,就是3,因为java spark这个短语,spark移动了3次,就可以跟一个doc匹配上了 。
slop的含义,不仅仅是说一个query string terms移动几次,跟一个doc匹配上。一个query string terms,最多可以移动几次去尝试跟一个doc匹配上
slop,设置的是3,那么就ok
GET /forum/article/_search { "query": { "match_phrase": { "title": { "query": "java spark", "slop": 3 } } } }
就可以把刚才那个doc匹配上,那个doc会作为结果返回
但是如果slop设置的是2,那么java spark,spark最多只能移动2次,此时跟doc是匹配不上的,那个doc是不会作为结果返回的。
示例一
我们那我们的测试数据来验证下
GET /forum/article/_search { "query": { "match_phrase": { "content": { "query": "spark data", "slop": 3 } } } }
分析一下slop
data经过了3次移动才匹配到 spark data ,所以 slop设置为3即可,当然了设置成比3大的数字,肯定也是可以查询到的,这里的slop设置为3 ,可以理解为至少移动3次。
示例二
如果我们搜索data spark 呢? 会不会匹配得到呢? 答案是 : 可以
来分析一下
示例三
slop搜索下,关键词离的越近,relevance score就会越高 .
GET /forum/article/_search { "query": { "match_phrase": { "title": { "query": "java blog", "slop": 5 } } } }
返回结果:
{ "took": 2, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 0.81487787, "hits": [ { "_index": "forum", "_type": "article", "_id": "2", "_score": 0.81487787, "_source": { "articleID": "KDKE-B-9947-#kL5", "userID": 1, "hidden": false, "postDate": "2017-01-02", "tag": [ "java" ], "tag_cnt": 1, "view_cnt": 50, "title": "this is java blog", "content": "i think java is the best programming language", "sub_title": "learned a lot of course", "author_first_name": "Smith", "author_last_name": "Williams", "new_author_last_name": "Williams", "new_author_first_name": "Smith" } }, { "_index": "forum", "_type": "article", "_id": "1", "_score": 0.31424814, "_source": { "articleID": "XHDK-A-1293-#fJ3", "userID": 1, "hidden": false, "postDate": "2017-01-01", "tag": [ "java", "hadoop" ], "tag_cnt": 2, "view_cnt": 30, "title": "this is java and elasticsearch blog", "content": "i like to write best elasticsearch article", "sub_title": "learning more courses", "author_first_name": "Peter", "author_last_name": "Smith", "new_author_last_name": "Smith", "new_author_first_name": "Peter" } }, { "_index": "forum", "_type": "article", "_id": "4", "_score": 0.31424814, "_source": { "articleID": "QQPX-R-3956-#aD8", "userID": 2, "hidden": true, "postDate": "2017-01-02", "tag": [ "java", "elasticsearch" ], "tag_cnt": 2, "view_cnt": 80, "title": "this is java, elasticsearch, hadoop blog", "content": "elasticsearch and hadoop are all very good solution, i am a beginner", "sub_title": "both of them are good", "author_first_name": "Robbin", "author_last_name": "Li", "new_author_last_name": "Li", "new_author_first_name": "Robbin" } } ] } }
可以看到
得分最高的
次之
最后