Lucene-全文索引-阿里云开发者社区

Lucene-全文索引

2015-06-28 1279

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

最近接触了lucene,我想也有很多人曾经听过，于是带着好奇心，我开始对lucene进行了解，给我影响最深的是它非常多的应用了索引表，这个工具之所以快是就是因为大量引用到了索引表。今天只说下我刚开始做的校历例子，创建索引。

下面对lucene从概念上做个介绍，Lucene是一个信息检索的函数库(Library),利用它你可以为你的应用加上索引和搜索的功能.Lucene的使用者不需要深入了解有关全文检索的知识,仅仅学会使用库中的一个类,你就为你的应用实现全文检索的功能.不过千万别以为Lucene是一个象google那样的搜索引擎,Lucene甚至不是一个应用程序,它仅仅是一个工具,一个Library.你也可以把它理解为一个将索引,搜索功能封装的很好的一套简单易用的API.利用这套API你可以做很多有关搜索的事情,而且很方便.

那么lucene可以做什么呢？Lucene可以对任何的数据做索引和搜索. Lucene不管数据源是什么格式,只要它能被转化为文字的形式,就可以被Lucene所分析利用.也就是说不管是MS word, Html ,pdf还是其他什么形式的文件只要你可以从中抽取出文字形式的内容就可以被Lucene所用.你就可以用Lucene对它们进行索引以及搜索. 下面是我做的一个小例子，就是一个查询生成索引的例子：

<span style="font-size:14px;">package com.jikexueyuan.study;
import java.io.File;
import java.io.IOException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.IntField;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.StringField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
public class IndexCreate {

	/**
	 * @param args
	 */
	public static void main(String[] args) {
		// TODO Auto-generated method stub
		Analyzer analyzer=new StandardAnalyzer(Version.LUCENE_46);//StandardAnalyzer是将英文按照空格、标点符号等进行分词，将中文按照单个字进行分词，一个汉字算一个词
		IndexWriterConfig indexWriterConfig=new IndexWriterConfig(Version.LUCENE_46,analyzer);//把写入的文件用指定的分词器将文章分词（这样检索的时候才能查的快），然后将词放入索引文件。
		indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);
		Directory directory=null;
		IndexWriter indexWriter=null;
		try {
			directory=FSDirectory.open(new File("E://index/test"));// //索引库存放在这个文件夹里  ,Directory表示索引文件保存的地方，是抽象类，两个子类FSDirectory表示文件中，RAMDirectory 表示存储在内存中  
			if(indexWriter.isLocked(directory)){
				indexWriter.unlock(directory);
			}
			indexWriter=new IndexWriter(directory,indexWriterConfig);
		} catch (Exception e) {
			e.printStackTrace();
		}
		
		//Document document=new Document();
		Document doc = new Document();
		doc.add(new StringField("id","abcde", Store.YES));
		doc.add(new org.apache.lucene.document.TextField("content","极客学院",Store.YES));
		doc.add(new IntField("num",1,Store.YES));
		
		try {
			indexWriter.addDocument(doc);//向索引中添加文档（Insert）
		} catch (Exception e) {
			
			e.printStackTrace();
			
		}
		
		Document doc1 = new Document();
		doc1.add(new StringField("id","sdfsd", Store.YES));
		doc1.add(new org.apache.lucene.document.TextField("content","Lucene案例",Store.YES));
		doc1.add(new IntField("num",1,Store.YES));
		
		try {
			indexWriter.addDocument(doc1);
		} catch (Exception e) {
			
			e.printStackTrace();
			
		}
		try {
			indexWriter.commit();
			indexWriter.close();
			directory.close();
		} catch (Exception e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}
</span>

结果会生成一系列的有关索引的文件，如下图：

从上面的例子我们可以看出创建索引需要的三个要素分别是：

1、indexWriter

2、Directory

3、Anayzer

4、Document

5、Field

对于lucene的分享还要继续，希望有越来越多的人可以共同努力！

Lucene-全文索引

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

Lucene-全文索引

热门文章

最新文章

相关电子书