lucene IndexOptions可以设置DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS DOCS,ES里也可以设置

简介:

org.apache.lucene.index

Java Code Examples for org.apache.lucene.index.IndexOptions

Example 4
Project: languagetool   File: EmptyLuceneIndexCreator.java View source code 6 votes vote downvote up
public static void main(String[] args) throws IOException { if (args.length != 1) { System.out.println("Usage: " + EmptyLuceneIndexCreator.class.getSimpleName() + " <indexPath>"); System.exit(1); } Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig config = new IndexWriterConfig(analyzer); Directory directory = FSDirectory.open(new File(args[0]).toPath()); IndexWriter writer = new IndexWriter(directory, config); FieldType fieldType = new FieldType(); fieldType.setIndexOptions(IndexOptions.DOCS); fieldType.setStored(true); Field countField = new Field("totalTokenCount", String.valueOf(0), fieldType); Document doc = new Document(); doc.add(countField); writer.addDocument(doc); writer.close(); }
 
ES里,
first of all index_options & term_vectors are two totally different things. 
index_options are "options" for the index you are searching on, a 
datastructure that holds "terms" to document lists (posting lists). 
TermVectors are a datastructure that gives you the "terms" for a given 
document and in addition their position in the document as well as their 
start and end character offsets. Now the index (each field has such an 
index) holds a sorted list of terms and each term points to a posting list. 
these posting lists are a list of documents that contain the term. On the 
posting list you can also store information like frequencies (how often did 
term Y occur in document X -> useful for scoring) as well as "positions" 
(at which position did term Y occur in document X -> this is required fo 
phrase & span queries). 

if you have for instance a field that you only use for filtering you don't 
need freqs and postions so documents only will do the job. In an index the 
position information is the biggest piece of data usually aside stored 
fields. If you don't do phrase queries or spans you don't need them at all 
so safe the disk space and improve perf by only use docs and freqs. In 
previous version it wasn't possible to have only freqs but no positions 
(index_options supersede omit_term_frequencies_and_positions) so this is an 
improvement overall since the most common usecase might only need freqs but 
no positions. 
 
附上一些选项:
1:term_vector
TermVector.YES: Only store number of occurrences.
TermVector.WITH_POSITIONS: Store number of occurrence and positions of terms, but no offset.
TermVector.WITH_OFFSETS: Store number of occurrence and offsets of terms, but no positions.
TermVector.WITH_POSITIONS_OFFSETS:number of occurrence and positions , offsets of terms.
TermVector.NO:Don't store any term vector information.
2:  index_options
Allows to set the indexing options, possible values are docs (only doc numbers are indexed), freqs (doc numbers and term frequencies), and positions (doc numbers, term frequencies and positions). Defaults to positions for analyzed fields, and to docs for not_analyzed fields. It is also possible to set it to offsets (doc numbers, term frequencies, positions and offsets).
 
参考:https://lucene.apache.org/core/4_1_0/core/org/apache/lucene/index/FieldInfo.IndexOptions.html
http://elasticsearch.cn/question/119













本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6397455.html ,如需转载请自行联系原作者
相关文章
|
12月前
|
存储 JavaScript
vue——store全局存储
【10月更文挑战第18天】Vuex 是 Vue.js 应用中非常重要的一个工具,它为我们提供了一种有效的状态管理方式。通过合理地使用 Vuex,我们可以更好地组织和管理应用的状态,提高应用的开发效率和质量。
333 1
|
消息中间件 存储 运维
从 Kafka 2.x 到 Kafka 3.x:升级之旅
从 Kafka 2.x 到 Kafka 3.x:升级之旅
2566 2
鸿蒙打电话功能
鸿蒙打电话功能
394 0
|
算法 小程序 JavaScript
【工具】我错了,这工具才是截图软件的神
本文介绍了一款名为Pixpin的强大截图工具,作者曾是Snipaste的忠实用户,但在尝试Pixpin后决定改换门庭。Pixpin不仅具备强大的截图功能,还支持文本识别、节点标注、长截图、颜色识别及贴图等功能,并且拥有活跃的社区反馈机制。文章详细讲解了Pixpin的各项特色功能及其使用方法,并提供了官方下载链接。通过实际操作演示,展示了Pixpin的便捷性和实用性。
658 0
【工具】我错了,这工具才是截图软件的神
|
分布式计算 Hadoop 分布式数据库
死磕HBase(二)
死磕HBase(二)
升级到jdk1.8后 sun/io/CharToByteConverter错误及处理
升级到jdk1.8后 sun/io/CharToByteConverter错误及处理
689 0
|
SQL 关系型数据库 MySQL
MySQL运行在docker容器中会损失多少性能
MySQL运行在docker容器中会损失多少性能
274 0
|
JavaScript 容器
Vue项目中安装并使用Echarts
Vue项目中安装并使用Echarts
|
NoSQL Shell Linux
|
安全 Java Nacos
【问题篇】整改Nacos漏洞——升级Nacos以及开启鉴权问题整理
【问题篇】整改Nacos漏洞——升级Nacos以及开启鉴权问题整理
1696 0