带你读《Elastic Stack 实战手册》之34：——3.4.2.17.3.全文搜索/精确搜索（13）-阿里云开发者社区

带你读《Elastic Stack 实战手册》之34：——3.4.2.17.3.全文搜索/精确搜索（13）

2023-05-25 98

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

检索分析服务 Elasticsearch 版，2核4GB开发者规格 1个月

简介： 带你读《Elastic Stack 实战手册》之34：——3.4.2.17.3.全文搜索/精确搜索（13）

《Elastic Stack 实战手册》——三、产品能力——3.4.入门篇——3.4.2.Elasticsearch基础应用——3.4.2.17.Text analysis, settings 及 mappings——3.4.2.17.3.全文搜索/精确搜索（12） https://developer.aliyun.com/article/1229927

可以看一下在中文环境下 IK 分词器的正确使用方法：

l 目标文档 "新年快乐，万事如意"。

l 使用 ”快乐，万事“ 这个组合词项去查询命中目标文档。

# 准备测试数据
PUT match_phrase
POST match_phrase/_mapping
{
  "properties": {
    "ik-smart": {
      "type": "text",
      "analyzer": "ik_smart"
    },
    "ik-max": {
      "type": "text",
      "analyzer": "ik_max_word"
    }
  }
}
POST match_phrase/_doc/1
{
  "ik-smart": "新年快乐，万事如意",
  "ik-max": "新年快乐，万事如意"
}
在这里 "新年快乐，万事如意" 这句话会被 IK 两个分词器 ik_max_word 和 ik_smart 索引。其具体效果如下：
# ik_smart 分词器
POST _analyze
{
  "analyzer": "ik_smart",
  "text": "新年快乐，万事如意"
}
# 返回结果
{
  "tokens" : [
    {
      "token" : "新年快乐",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "万事如意",
      "start_offset" : 5,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}
很明显，在 ik_smart 的分词器作用下，词项中并没有单独的“快乐”和“万事”两个词项，因此搜索“ 快乐，万事”时没有结果的。
那看看 ik_max_word 的情况：
# ik_max_word 分词器
POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "新年快乐，万事如意"
}
# 返回结果
{
  "tokens" : [
    {
      "token" : "新年快乐",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "新年",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "快乐",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "万事如意",
      "start_offset" : 5,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "万事",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "万",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "TYPE_CNUM",
      "position" : 5
    },
    {
      "token" : "事",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 6
    },
    {
      "token" : "如意",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 7
    }
  ]
}

可以看到这次产生了“快乐”和“万事”两个词项，但是两者的 position 之差为 2 ，因此想要搜索命中文档，需要设置 slop 为 1 。

因此正确的查询方式如下：

GET match_phrase/_search
{
  "query": {
    "match_phrase": {
      "ik-max": {
        "query": "快乐，万事",
        "slop": 1
      }
    }
  }
}

4.4 match_phrase_prefix

match_phrase_prefix 查询会将被查询的内容进行分词后进行 match_phrase 查询，其中最后一个词项在做词项比对的时候是 prefix 查询。

比如下面三个文档：

l quick brown fox

l two quick brown ferrets

l the fox is quick and brown

进行下面的查询：

GET my-index-000001/_search
{
  "query": {
    "match_phrase_prefix": {
      "message": {
        "query": "quick brown f"
      }
    }
  }
}

则能正常返回前两个文档。

《Elastic Stack 实战手册》——三、产品能力——3.4.入门篇——3.4.2.Elasticsearch基础应用——3.4.2.17.Text analysis, settings 及 mappings——3.4.2.17.3.全文搜索/精确搜索（14） https://developer.aliyun.com/article/1229925

带你读《Elastic Stack 实战手册》之34：——3.4.2.17.3.全文搜索/精确搜索（13）

检索分析服务 Elasticsearch版

热门文章

最新文章

相关电子书

相关实验场景