ES搜索排序,文档相关度评分介绍——Field-length norm

简介:

Field-length norm

How long is the field? The shorter the field, the higher the weight. If a term appears in a short field, such as a title field, it is more likely that the content of that field is about the term than if the same term appears in a much bigger body field. The field length norm is calculated as follows:

norm(d) = 1 / √numTerms 

The field-length norm (norm) is the inverse square root of the number of terms in the field.

While the field-length norm is important for full-text search, many other fields don’t need norms. Norms consume approximately 1 byte per string field per document in the index, whether or not a document contains the field. Exact-value not_analyzed string fields have norms disabled by default, but you can use the field mapping to disable norms on analyzed fields as well:

PUT /my_index
{
  "mappings": { "doc": { "properties": { "text": { "type": "string", "norms": { "enabled": false }  } } } } }

This field will not take the field-length norm into account. A long field and a short field will be scored as if they were the same length.

For use cases such as logging, norms are not useful. All you care about is whether a field contains a particular error code or a particular browser identifier. The length of the field does not affect the outcome. Disabling norms can save a significant amount of memory.

Putting it together

These three factors—term frequency, inverse document frequency, and field-length norm—are calculated and stored at index time. Together, they are used to calculate the weight of a single term in a particular document.

Tip

When we refer to documents in the preceding formulae, we are actually talking about a field within a document. Each field has its own inverted index and thus, for TF/IDF purposes, the value of the field is the value of the document.

When we run a simple term query with explain set to true (see Understanding the Score), you will see that the only factors involved in calculating the score are the ones explained in the preceding sections:

PUT /my_index/doc/1 { "text" : "quick brown fox" } GET /my_index/doc/_search?explain { "query": { "term": { "text": "fox" } } }

The (abbreviated) explanation from the preceding request is as follows:

weight(text:fox in 0) [PerFieldSimilarity]:  0.15342641 
result of:
    fieldWeight in 0                         0.15342641
    product of:
        tf(freq=1.0), with freq of 1:        1.0 
        idf(docFreq=1, maxDocs=1):           0.30685282 
        fieldNorm(doc=0):                    0.5 

The final score for term fox in field text in the document with internal Lucene doc ID 0.

The term fox appears once in the text field in this document.

The inverse document frequency of fox in the text field in all documents in this index.

The field-length normalization factor for this field.

Of course, queries usually consist of more than one term, so we need a way of combining the weights of multiple terms. For this, we turn to the vector space model.

 

 










本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6474084.html ,如需转载请自行联系原作者



相关文章
|
消息中间件 分布式计算 监控
Python面试:消息队列(RabbitMQ、Kafka)基础知识与应用
【4月更文挑战第18天】本文探讨了Python面试中RabbitMQ与Kafka的常见问题和易错点,包括两者的基础概念、特性对比、Python客户端使用、消息队列应用场景及消息可靠性保证。重点讲解了消息丢失与重复的避免策略,并提供了实战代码示例,帮助读者提升在分布式系统中使用消息队列的能力。
671 2
|
Python
patchworklib,一款极其强大的 Python 库!
patchworklib,一款极其强大的 Python 库!
229 1
|
存储 开发工具 异构计算
第三章 硬件描述语言verilog(二) 功能描述-组合逻辑(下)
第三章 硬件描述语言verilog(二) 功能描述-组合逻辑
1948 0
第三章 硬件描述语言verilog(二) 功能描述-组合逻辑(下)
|
存储 监控 Java
Java多线程优化:提高线程池性能的技巧与实践
Java多线程优化:提高线程池性能的技巧与实践
583 1
|
传感器 存储 运维
智能物联网:LoRaWAN技术在低功耗广域网中的应用
【10月更文挑战第26天】本文详细介绍了LoRaWAN技术的基本原理、应用场景及实际应用示例。LoRaWAN是一种低功耗、长距离的网络层协议,适用于智能城市、农业、工业监控等领域。文章通过示例代码展示了如何使用LoRaWAN传输温湿度数据,并强调了其在物联网中的重要性和广阔前景。
552 6
|
XML JSON API
深入理解RESTful API设计:最佳实践与实现
【10月更文挑战第9天】深入理解RESTful API设计:最佳实践与实现
680 1
|
前端开发 JavaScript
去除router-link中的下划线
这篇文章介绍了如何在Vue.js中去除`<router-link>`元素的默认下划线样式,通过全局CSS覆盖来保持页面样式的整洁。
去除router-link中的下划线
|
Android开发
【苹果安卓通用】xlsx 和 vCard 文件转换器,txt转vCard文件格式,CSV转 vCard格式,如何批量号码导入手机通讯录,一篇文章说全
本文介绍了如何快速将批量号码导入手机通讯录,适用于企业客户管理、营销团队、活动组织、团队协作和新员工入职等场景。步骤包括:1) 下载软件,提供腾讯云盘和百度网盘链接;2) 打开软件,复制粘贴号码并进行加载预览和制作文件;3) 将制作好的文件通过QQ或微信发送至手机,然后按苹果、安卓或鸿蒙系统的指示导入。整个过程简便快捷,可在1分钟内完成。
976 6
|
存储 机器学习/深度学习 安全
|
Linux 程序员 Perl
老程序员分享:Linux查看系统开机时间
老程序员分享:Linux查看系统开机时间
1121 0