白话Elasticsearch08-深度探秘搜索技术之基于boost的细粒度搜索条件权重控制

本文涉及的产品
检索分析服务 Elasticsearch 版,2核4GB开发者规格 1个月
简介: 白话Elasticsearch08-深度探秘搜索技术之基于boost的细粒度搜索条件权重控制

20190806092132811.jpg


概述


继续跟中华石杉老师学习ES,第八篇

课程地址: https://www.roncoo.com/view/55


boost


https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-boost.html

知识点:


如果给某个字段设置boost 为2 ,则意味着改字段的权重比其他的值的权重大一倍 。权重值默认为1


The boost is applied only for term queries (prefix, range and fuzzy queries are not boosted).


示例


数据如下:

      {
        "_index": "forum",
        "_type": "article",
        "_id": "5",
        "_score": 1,
        "_source": {
          "articleID": "DHJK-B-1395-#Ky5",
          "userID": 3,
          "hidden": false,
          "postDate": "2019-05-01",
          "tag": [
            "elasticsearch"
          ],
          "tag_cnt": 1,
          "view_cnt": 10,
          "title": "this is spark blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "2",
        "_score": 1,
        "_source": {
          "articleID": "KDKE-B-9947-#kL5",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-02",
          "tag": [
            "java"
          ],
          "tag_cnt": 1,
          "view_cnt": 50,
          "title": "this is java blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "4",
        "_score": 1,
        "_source": {
          "articleID": "QQPX-R-3956-#aD8",
          "userID": 2,
          "hidden": true,
          "postDate": "2017-01-02",
          "tag": [
            "java",
            "elasticsearch"
          ],
          "tag_cnt": 2,
          "view_cnt": 80,
          "title": "this is java, elasticsearch, hadoop blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 1,
        "_source": {
          "articleID": "XHDK-A-1293-#fJ3",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "java",
            "hadoop"
          ],
          "tag_cnt": 2,
          "view_cnt": 30,
          "title": "this is java and elasticsearch blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "3",
        "_score": 1,
        "_source": {
          "articleID": "JODL-X-1937-#pV7",
          "userID": 2,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "hadoop"
          ],
          "tag_cnt": 1,
          "view_cnt": 100,
          "title": "this is elasticsearch blog"
        }
      }


需求: 搜索标题中必须包含blog的帖子,同时如果标题中包含java或elasticsearch或hadoop或spark也要搜索出来,同时如果一个帖子包含spark,包含spark的帖子要优先其他帖子搜索出来

需求实现DSL如下:

GET /forum/article/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "title": "blog"
        }
      },
      "should": [
        {
          "match": {
            "title": {
              "query": "java"
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "elasticsearch"
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "hadoop"
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "spark",
              "boost": 5
            }
          }
        }
      ]
    }
  }
}

返回结果 :

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.7260925,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "5",
        "_score": 1.7260925,
        "_source": {
          "articleID": "DHJK-B-1395-#Ky5",
          "userID": 3,
          "hidden": false,
          "postDate": "2019-05-01",
          "tag": [
            "elasticsearch"
          ],
          "tag_cnt": 1,
          "view_cnt": 10,
          "title": "this is spark blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "4",
        "_score": 1.6185135,
        "_source": {
          "articleID": "QQPX-R-3956-#aD8",
          "userID": 2,
          "hidden": true,
          "postDate": "2017-01-02",
          "tag": [
            "java",
            "elasticsearch"
          ],
          "tag_cnt": 2,
          "view_cnt": 80,
          "title": "this is java, elasticsearch, hadoop blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 0.8630463,
        "_source": {
          "articleID": "XHDK-A-1293-#fJ3",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "java",
            "hadoop"
          ],
          "tag_cnt": 2,
          "view_cnt": 30,
          "title": "this is java and elasticsearch blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "articleID": "JODL-X-1937-#pV7",
          "userID": 2,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "hadoop"
          ],
          "tag_cnt": 1,
          "view_cnt": 100,
          "title": "this is elasticsearch blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "2",
        "_score": 0.3971361,
        "_source": {
          "articleID": "KDKE-B-9947-#kL5",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-02",
          "tag": [
            "java"
          ],
          "tag_cnt": 1,
          "view_cnt": 50,
          "title": "this is java blog"
        }
      }
    ]
  }
}


可以看到spark的帖子,相关度得分最高,排在了第一位。


搜索条件的权重,boost,可以将某个搜索条件的权重加大,此时当匹配这个搜索条件和匹配另一个搜索条件的document,计算relevance score时,匹配权重更大的搜索条件的document,relevance score会更高,当然也就会优先被返回回来


我们如果把boost去掉会怎样呢? 来看下

GET /forum/article/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "title": "blog"
        }
      },
      "should": [
        {
          "match": {
            "title": {
              "query": "java"
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "elasticsearch"
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "hadoop"
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "spark"
            }
          }
        }
      ]
    }
  }
}

返回:

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.6185135,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "4",
        "_score": 1.6185135,
        "_source": {
          "articleID": "QQPX-R-3956-#aD8",
          "userID": 2,
          "hidden": true,
          "postDate": "2017-01-02",
          "tag": [
            "java",
            "elasticsearch"
          ],
          "tag_cnt": 2,
          "view_cnt": 80,
          "title": "this is java, elasticsearch, hadoop blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 0.8630463,
        "_source": {
          "articleID": "XHDK-A-1293-#fJ3",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "java",
            "hadoop"
          ],
          "tag_cnt": 2,
          "view_cnt": 30,
          "title": "this is java and elasticsearch blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "5",
        "_score": 0.5753642,
        "_source": {
          "articleID": "DHJK-B-1395-#Ky5",
          "userID": 3,
          "hidden": false,
          "postDate": "2019-05-01",
          "tag": [
            "elasticsearch"
          ],
          "tag_cnt": 1,
          "view_cnt": 10,
          "title": "this is spark blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "articleID": "JODL-X-1937-#pV7",
          "userID": 2,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "hadoop"
          ],
          "tag_cnt": 1,
          "view_cnt": 100,
          "title": "this is elasticsearch blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "2",
        "_score": 0.3971361,
        "_source": {
          "articleID": "KDKE-B-9947-#kL5",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-02",
          "tag": [
            "java"
          ],
          "tag_cnt": 1,
          "view_cnt": 50,
          "title": "this is java blog"
        }
      }
    ]
  }
}


spark的帖子并没有优先展示出来 ,可见boost权重确实起了作用。

相关实践学习
使用阿里云Elasticsearch体验信息检索加速
通过创建登录阿里云Elasticsearch集群,使用DataWorks将MySQL数据同步至Elasticsearch,体验多条件检索效果,简单展示数据同步和信息检索加速的过程和操作。
ElasticSearch 入门精讲
ElasticSearch是一个开源的、基于Lucene的、分布式、高扩展、高实时的搜索与数据分析引擎。根据DB-Engines的排名显示,Elasticsearch是最受欢迎的企业搜索引擎,其次是Apache Solr(也是基于Lucene)。 ElasticSearch的实现原理主要分为以下几个步骤: 用户将数据提交到Elastic Search 数据库中 通过分词控制器去将对应的语句分词,将其权重和分词结果一并存入数据 当用户搜索数据时候,再根据权重将结果排名、打分 将返回结果呈现给用户 Elasticsearch可以用于搜索各种文档。它提供可扩展的搜索,具有接近实时的搜索,并支持多租户。
相关文章
|
26天前
|
存储 人工智能 自然语言处理
阿里云Elasticsearch AI场景语义搜索最佳实践
本文介绍了如何使用阿里云Elasticsearch结合搜索开发工作台搭建AI语义搜索。
16548 67
|
19天前
|
数据采集 人工智能 安全
阿里云Elasticsearch 企业级AI搜索方案发布
本文从AI搜索落地的挑战、阿里云在RAG场景的实践、效果提升三个方面,深度解读阿里云Elasticsearch 企业级AI搜索方案。
178 8
|
17天前
|
数据采集 人工智能 自然语言处理
阿里云Elasticsearch AI语义搜索:解锁未来搜索新纪元,精准洞察数据背后的故事!
【8月更文挑战第2天】阿里云Elasticsearch AI场景语义搜索最佳实践
81 5
|
2月前
|
存储 监控 NoSQL
RedisSearch与Elasticsearch:技术对比与选择指南
RedisSearch与Elasticsearch:技术对比与选择指南
|
2月前
|
搜索推荐 开发者
如何在 Elasticsearch 中选择精确 kNN 搜索和近似 kNN 搜索
【6月更文挑战第8天】Elasticsearch 是一款强大的搜索引擎,支持精确和近似 kNN 搜索。精确 kNN 搜索保证高准确性但计算成本高,适用于对精度要求极高的场景。近似 kNN 搜索则通过牺牲部分精度来提升搜索效率,适合大数据量和实时性要求高的情况。开发者应根据业务需求和数据特性权衡选择。随着技术发展,kNN 搜索将在更多领域发挥关键作用。
54 4
|
1月前
|
运维 监控 Java
在大数据场景下,Elasticsearch作为分布式搜索与分析引擎,因其扩展性和易用性成为全文检索首选。
【7月更文挑战第1天】在大数据场景下,Elasticsearch作为分布式搜索与分析引擎,因其扩展性和易用性成为全文检索首选。本文讲解如何在Java中集成Elasticsearch,包括安装配置、使用RestHighLevelClient连接、创建索引和文档操作,以及全文检索查询。此外,还涉及高级查询、性能优化和故障排查,帮助开发者高效处理非结构化数据。
41 0
|
2月前
|
缓存 监控 索引
Elasticsearch中的post_filter后置过滤器技术
Elasticsearch中的post_filter后置过滤器技术
|
2月前
|
存储 JSON 自然语言处理
技术经验分享:Elasticsearch倒排索引结构
技术经验分享:Elasticsearch倒排索引结构
21 0
|
2月前
|
Java API 索引
必知的技术知识:Elasticsearch和Kibana安装
必知的技术知识:Elasticsearch和Kibana安装
25 0
|
2月前
|
缓存 Java API
在生产环境中部署Elasticsearch:最佳实践和故障排除技巧——聚合与搜索(三)
在生产环境中部署Elasticsearch:最佳实践和故障排除技巧——聚合与搜索(三)