ES搜索排序,文档相关度评分介绍——Field-length norm

简介:

Field-length norm

How long is the field? The shorter the field, the higher the weight. If a term appears in a short field, such as a title field, it is more likely that the content of that field is about the term than if the same term appears in a much bigger body field. The field length norm is calculated as follows:

norm(d) = 1 / √numTerms 

The field-length norm (norm) is the inverse square root of the number of terms in the field.

While the field-length norm is important for full-text search, many other fields don’t need norms. Norms consume approximately 1 byte per string field per document in the index, whether or not a document contains the field. Exact-value not_analyzed string fields have norms disabled by default, but you can use the field mapping to disable norms on analyzed fields as well:

PUT /my_index
{
  "mappings": { "doc": { "properties": { "text": { "type": "string", "norms": { "enabled": false }  } } } } }

This field will not take the field-length norm into account. A long field and a short field will be scored as if they were the same length.

For use cases such as logging, norms are not useful. All you care about is whether a field contains a particular error code or a particular browser identifier. The length of the field does not affect the outcome. Disabling norms can save a significant amount of memory.

Putting it together

These three factors—term frequency, inverse document frequency, and field-length norm—are calculated and stored at index time. Together, they are used to calculate the weight of a single term in a particular document.

Tip

When we refer to documents in the preceding formulae, we are actually talking about a field within a document. Each field has its own inverted index and thus, for TF/IDF purposes, the value of the field is the value of the document.

When we run a simple term query with explain set to true (see Understanding the Score), you will see that the only factors involved in calculating the score are the ones explained in the preceding sections:

PUT /my_index/doc/1 { "text" : "quick brown fox" } GET /my_index/doc/_search?explain { "query": { "term": { "text": "fox" } } }

The (abbreviated) explanation from the preceding request is as follows:

weight(text:fox in 0) [PerFieldSimilarity]:  0.15342641 
result of:
    fieldWeight in 0                         0.15342641
    product of:
        tf(freq=1.0), with freq of 1:        1.0 
        idf(docFreq=1, maxDocs=1):           0.30685282 
        fieldNorm(doc=0):                    0.5 

The final score for term fox in field text in the document with internal Lucene doc ID 0.

The term fox appears once in the text field in this document.

The inverse document frequency of fox in the text field in all documents in this index.

The field-length normalization factor for this field.

Of course, queries usually consist of more than one term, so we need a way of combining the weights of multiple terms. For this, we turn to the vector space model.

 

 










本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6474084.html ,如需转载请自行联系原作者



相关文章
|
消息中间件 分布式计算 监控
Python面试:消息队列(RabbitMQ、Kafka)基础知识与应用
【4月更文挑战第18天】本文探讨了Python面试中RabbitMQ与Kafka的常见问题和易错点,包括两者的基础概念、特性对比、Python客户端使用、消息队列应用场景及消息可靠性保证。重点讲解了消息丢失与重复的避免策略,并提供了实战代码示例,帮助读者提升在分布式系统中使用消息队列的能力。
470 2
|
存储 监控 Java
Java多线程优化:提高线程池性能的技巧与实践
Java多线程优化:提高线程池性能的技巧与实践
434 1
|
12月前
|
Java 索引
Java“StringIndexOutOfBoundsException”解决
Java中的“StringIndexOutOfBoundsException”异常通常发生在尝试访问字符串中不存在的索引时。解决方法包括:1. 检查字符串长度,确保索引值在有效范围内;2. 使用条件语句避免越界访问;3. 对输入进行有效性验证。
1056 6
|
12月前
|
安全 Java 开发者
java的synchronized有几种加锁方式
Java的 `synchronized`通过上述三种加锁方式,为开发者提供了从粗粒度到细粒度的并发控制能力,满足了不同场景下的线程安全需求。合理选择加锁方式对于提升程序的并发性能和正确性至关重要,开发者应根据实际应用场景的特性和性能要求来决定使用哪种加锁策略。
156 0
java使用AOP切面获取请求日志并记录
java使用AOP切面获取请求日志并记录
|
算法 搜索推荐 Python
探索Python中的推荐系统:混合推荐模型
探索Python中的推荐系统:混合推荐模型
946 1
|
机器学习/深度学习 数据采集 算法
Value(低价值密度)
Value(低价值密度)
1397 1
|
Android开发
appium 安装 uiautomator2-server-debug-androidTest.apk 、appium-uiautomator2-server-v5.12.16.apk 失败的解决办法
appium 安装 uiautomator2-server-debug-androidTest.apk 、appium-uiautomator2-server-v5.12.16.apk 失败的解决办法
683 0
|
XML API Android开发
Android 自定义View 之 Dialog弹窗
Android 自定义View 之 Dialog弹窗
427 1
|
监控 数据安全/隐私保护 Android开发
通过UptimeRobot免费监控网站状态并使用邮箱+APP进行通知
这篇文章介绍了如何使用UptimeRobot免费监控网站的状态,并通过邮箱和APP进行通知。UptimeRobot是一个国外的网络监控服务,它可以定期检查网站或服务是否正常运行,并在发现故障或异常时发送警报通知给用户。文章详细介绍了如何注册、添加监控项目,并使用APP进行通知。最后还提到了可在APP中修改和添加监控项目,并更改手机名称的功能。
712 0
通过UptimeRobot免费监控网站状态并使用邮箱+APP进行通知