带你读《Elastic Stack 实战手册》之34：——3.4.2.17.3.全文搜索/精确搜索（11）-阿里云开发者社区

带你读《Elastic Stack 实战手册》之34：——3.4.2.17.3.全文搜索/精确搜索（11）

2023-05-25 107

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

检索分析服务 Elasticsearch 版，2核4GB开发者规格 1个月

简介： 带你读《Elastic Stack 实战手册》之34：——3.4.2.17.3.全文搜索/精确搜索（11）

《Elastic Stack 实战手册》——三、产品能力——3.4.入门篇——3.4.2.Elasticsearch基础应用——3.4.2.17.Text analysis, settings 及 mappings——3.4.2.17.3.全文搜索/精确搜索（10） https://developer.aliyun.com/article/1229931

4.3 match_phrase

match_phrase 执行的是短语查询，与简单的 match 查询不同的是，match_phrase 在经过 analyzer 解析后保持了词项的匹配顺序。因此实现了短语被完整匹配的场景。

比如，关于 "foxes brown" 的查询：

先测试 match 查询。

GET my-index-000001/_search
{
  "query": {
    "match": {
      "full_text": "foxes brown"
    }
  }
}
# 返回结果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.26706278,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.26706278,
        "_source" : {
          "full_text" : "quick brown foxes!"
        }
      },
      {
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.26706278,
        "_source" : {
          "full_text" : "Quick Foxes Brown !"
        }
      }
    ]
  }
}

两个结果都返回了，查询出的结果与词项之间的排序并没有关联。再使用 match_phrase 查询。

GET my-index-000001/_search
{
  "query": {
    "match_phrase": {
      "full_text": "foxes brown"
    }
  }
}
# 返回结果
{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.26706278,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.26706278,
        "_source" : {
          "full_text" : "Quick Foxes Brown !"
        }
      }
    ]
  }
}

只有文档 2 返回了，返回文档的词项顺序与被查询的文档词项顺序一致。

slop 参数

match_phrase 查询为什么能将查询的内容保持原来的顺序呢?

这主要依靠 slop 参数。 slop 参数默认值为0， slop 主要设置词项间的匹配距离，即两个词项紧临时 slop 为 0（词项的 position 之差为 1）。

查询有这 2 个条件：

1、按照词项顺序查询文档。

2、检查被匹配到的文档中词项之间的距离总和是否在 slop 参数设置的匹配距离内。

比如下面几个文档：

○ this is test

○ this is a test

○ this is not a test

○ this a is not a test

○ this a or is not a test

我们确认下词项间的距离，可以通过 _analyze API 查看 position 来计算。为了简洁，我们看下第一个和最后一个文档。

GET _analyze
{
  "text": [
    "this is test"
  ]
}
# 返回结果
{
  "tokens" : [
    {
      "token" : "this",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "is",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "test",
      "start_offset" : 8,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}
GET _analyze
{
  "text": [
    "this a  or is  not a test"
  ]
}
# 返回结果
{
  "tokens" : [
    {
      "token" : "this",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "a",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "is",
      "start_offset" : 11,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "not",
      "start_offset" : 15,
      "end_offset" : 18,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "a",
      "start_offset" : 19,
      "end_offset" : 20,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "test",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "<ALPHANUM>",
      "position" : 6
    }
  ]
}

可以看到，position记录着单词项的位置顺序。

《Elastic Stack 实战手册》——三、产品能力——3.4.入门篇——3.4.2.Elasticsearch基础应用——3.4.2.17.Text analysis, settings 及 mappings——3.4.2.17.3.全文搜索/精确搜索（12） https://developer.aliyun.com/article/1229927

带你读《Elastic Stack 实战手册》之34：——3.4.2.17.3.全文搜索/精确搜索（11）

检索分析服务 Elasticsearch版

热门文章

最新文章

相关电子书

相关实验场景