这回我们来学习Lucene的排序。机智的少年应该已经发现了,IndexSearcher类的search方法有好几个重载:
- /** Finds the top <code>n</code>
- * hits for <code>query</code>.
- *
- * @throws BooleanQuery.TooManyClauses If a query would exceed
- * {@link BooleanQuery#getMaxClauseCount()} clauses.
- */
- public TopDocs search(Query query, int n)
- throws IOException {
- return search(query, null, n);
- }
- /** Finds the top <code>n</code>
- * hits for <code>query</code>, applying <code>filter</code> if non-null.
- *
- * @throws BooleanQuery.TooManyClauses If a query would exceed
- * {@link BooleanQuery#getMaxClauseCount()} clauses.
- */
- public TopDocs search(Query query, Filter filter, int n)
- throws IOException {
- return search(createNormalizedWeight(wrapFilter(query, filter)), null, n);
- }
- /** Lower-level search API.
- *
- * <p>{@link LeafCollector#collect(int)} is called for every matching
- * document.
- *
- * @param query to match documents
- * @param filter if non-null, used to permit documents to be collected.
- * @param results to receive hits
- * @throws BooleanQuery.TooManyClauses If a query would exceed
- * {@link BooleanQuery#getMaxClauseCount()} clauses.
- */
- public void search(Query query, Filter filter, Collector results)
- throws IOException {
- search(leafContexts, createNormalizedWeight(wrapFilter(query, filter)), results);
- }
- /** Lower-level search API.
- *
- * <p>{@link LeafCollector#collect(int)} is called for every matching document.
- *
- * @throws BooleanQuery.TooManyClauses If a query would exceed
- * {@link BooleanQuery#getMaxClauseCount()} clauses.
- */
- public void search(Query query, Collector results)
- throws IOException {
- search(leafContexts, createNormalizedWeight(query), results);
- }
- /** Search implementation with arbitrary sorting. Finds
- * the top <code>n</code> hits for <code>query</code>, applying
- * <code>filter</code> if non-null, and sorting the hits by the criteria in
- * <code>sort</code>.
- *
- * <p>NOTE: this does not compute scores by default; use
- * {@link IndexSearcher#search(Query,Filter,int,Sort,boolean,boolean)} to
- * control scoring.
- *
- * @throws BooleanQuery.TooManyClauses If a query would exceed
- * {@link BooleanQuery#getMaxClauseCount()} clauses.
- */
- public TopFieldDocs search(Query query, Filter filter, int n,
- Sort sort) throws IOException {
- return search(createNormalizedWeight(wrapFilter(query, filter)), n, sort, false, false);
- }
- /** Search implementation with arbitrary sorting, plus
- * control over whether hit scores and max score
- * should be computed. Finds
- * the top <code>n</code> hits for <code>query</code>, applying
- * <code>filter</code> if non-null, and sorting the hits by the criteria in
- * <code>sort</code>. If <code>doDocScores</code> is <code>true</code>
- * then the score of each hit will be computed and
- * returned. If <code>doMaxScore</code> is
- * <code>true</code> then the maximum score over all
- * collected hits will be computed.
- *
- * @throws BooleanQuery.TooManyClauses If a query would exceed
- * {@link BooleanQuery#getMaxClauseCount()} clauses.
- */
- public TopFieldDocs search(Query query, Filter filter, int n,
- Sort sort, boolean doDocScores, boolean doMaxScore) throws IOException {
- return search(createNormalizedWeight(wrapFilter(query, filter)), n, sort, doDocScores, doMaxScore);
- }
query参数就不用解释了,filter用来再次过滤的,int n表示只返回Top N,Sort表示排序对象,
doDocScores这个参数是重点,表示是否对文档进行相关性打分,如果你设为false,那你索引文档的score值就是NAN,
doMaxScore表示啥意思呢,举个例子说明吧,假如你有两个Query(QueryA和QueryB),两个条件是通过BooleanQuery连接起来的,假如QueryA条件匹配到某个索引文档,而QueryB条件也同样匹配到该文档,如果doMaxScore设为true,表示该文档的评分计算规则为取两个Query(当然你可能会有N个Query链接,那就是N个Query中取最大值)之中的最大值,否则就是取两个Query查询评分的相加求和。默认为false.
注意:在Lucene4.x时代,doDocScores和doMaxScore这两个参数可以通过indexSearcher类来设置,比如这样:
- searcher.setDefaultFieldSortScoring(true, false);
而在Lucene5.x时代,你只能在调用search方法时传入这两个参数,比如这样:
- searcher.search(query, filter, n, sort, doDocScores, doMaxScore);
看方法声明我们知道,我们如果需要改变默认的按评分降序排序行为,则必须传入一个Sort对象,那我们来观摩下Sort类源码:
- package org.apache.lucene.search;
- /*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
- import java.io.IOException;
- import java.util.Arrays;
- /**
- * Encapsulates sort criteria for returned hits.
- *
- * <p>The fields used to determine sort order must be carefully chosen.
- * Documents must contain a single term in such a field,
- * and the value of the term should indicate the document's relative position in
- * a given sort order. The field must be indexed, but should not be tokenized,
- * and does not need to be stored (unless you happen to want it back with the
- * rest of your document data). In other words:
- *
- * <p><code>document.add (new Field ("byNumber", Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));</code></p>
- *
- *
- * <p><h3>Valid Types of Values</h3>
- *
- * <p>There are four possible kinds of term values which may be put into
- * sorting fields: Integers, Longs, Floats, or Strings. Unless
- * {@link SortField SortField} objects are specified, the type of value
- * in the field is determined by parsing the first term in the field.
- *
- * <p>Integer term values should contain only digits and an optional
- * preceding negative sign. Values must be base 10 and in the range
- * <code>Integer.MIN_VALUE</code> and <code>Integer.MAX_VALUE</code> inclusive.
- * Documents which should appear first in the sort
- * should have low value integers, later documents high values
- * (i.e. the documents should be numbered <code>1..n</code> where
- * <code>1</code> is the first and <code>n</code> the last).
- *
- * <p>Long term values should contain only digits and an optional
- * preceding negative sign. Values must be base 10 and in the range
- * <code>Long.MIN_VALUE</code> and <code>Long.MAX_VALUE</code> inclusive.
- * Documents which should appear first in the sort
- * should have low value integers, later documents high values.
- *
- * <p>Float term values should conform to values accepted by
- * {@link Float Float.valueOf(String)} (except that <code>NaN</code>
- * and <code>Infinity</code> are not supported).
- * Documents which should appear first in the sort
- * should have low values, later documents high values.
- *
- * <p>String term values can contain any valid String, but should
- * not be tokenized. The values are sorted according to their
- * {@link Comparable natural order}. Note that using this type
- * of term value has higher memory requirements than the other
- * two types.
- *
- * <p><h3>Object Reuse</h3>
- *
- * <p>One of these objects can be
- * used multiple times and the sort order changed between usages.
- *
- * <p>This class is thread safe.
- *
- * <p><h3>Memory Usage</h3>
- *
- * <p>Sorting uses of caches of term values maintained by the
- * internal HitQueue(s). The cache is static and contains an integer
- * or float array of length <code>IndexReader.maxDoc()</code> for each field
- * name for which a sort is performed. In other words, the size of the
- * cache in bytes is:
- *
- * <p><code>4 * IndexReader.maxDoc() * (# of different fields actually used to sort)</code>
- *
- * <p>For String fields, the cache is larger: in addition to the
- * above array, the value of every term in the field is kept in memory.
- * If there are many unique terms in the field, this could
- * be quite large.
- *
- * <p>Note that the size of the cache is not affected by how many
- * fields are in the index and <i>might</i> be used to sort - only by
- * the ones actually used to sort a result set.
- *
- * <p>Created: Feb 12, 2004 10:53:57 AM
- *
- * @since lucene 1.4
- */
- public class Sort {
- /**
- * Represents sorting by computed relevance. Using this sort criteria returns
- * the same results as calling
- * {@link IndexSearcher#search(Query,int) IndexSearcher#search()}without a sort criteria,
- * only with slightly more overhead.
- */
- public static final Sort RELEVANCE = new Sort();
- /** Represents sorting by index order. */
- public static final Sort INDEXORDER = new Sort(SortField.FIELD_DOC);
- // internal representation of the sort criteria
- SortField[] fields;
- /**
- * Sorts by computed relevance. This is the same sort criteria as calling
- * {@link IndexSearcher#search(Query,int) IndexSearcher#search()}without a sort criteria,
- * only with slightly more overhead.
- */
- public Sort() {
- this(SortField.FIELD_SCORE);
- }
- /** Sorts by the criteria in the given SortField. */
- public Sort(SortField field) {
- setSort(field);
- }
- /** Sets the sort to the given criteria in succession: the
- * first SortField is checked first, but if it produces a
- * tie, then the second SortField is used to break the tie,
- * etc. Finally, if there is still a tie after all SortFields
- * are checked, the internal Lucene docid is used to break it. */
- public Sort(SortField... fields) {
- setSort(fields);
- }
- /** Sets the sort to the given criteria. */
- public void setSort(SortField field) {
- this.fields = new SortField[] { field };
- }
- /** Sets the sort to the given criteria in succession: the
- * first SortField is checked first, but if it produces a
- * tie, then the second SortField is used to break the tie,
- * etc. Finally, if there is still a tie after all SortFields
- * are checked, the internal Lucene docid is used to break it. */
- public void setSort(SortField... fields) {
- this.fields = fields;
- }
- /**
- * Representation of the sort criteria.
- * @return Array of SortField objects used in this sort criteria
- */
- public SortField[] getSort() {
- return fields;
- }
- /**
- * Rewrites the SortFields in this Sort, returning a new Sort if any of the fields
- * changes during their rewriting.
- *
- * @param searcher IndexSearcher to use in the rewriting
- * @return {@code this} if the Sort/Fields have not changed, or a new Sort if there
- * is a change
- * @throws IOException Can be thrown by the rewriting
- * @lucene.experimental
- */
- public Sort rewrite(IndexSearcher searcher) throws IOException {
- boolean changed = false;
- SortField[] rewrittenSortFields = new SortField[fields.length];
- for (int i = 0; i < fields.length; i++) {
- rewrittenSortFields[i] = fields[i].rewrite(searcher);
- if (fields[i] != rewrittenSortFields[i]) {
- changed = true;
- }
- }
- return (changed) ? new Sort(rewrittenSortFields) : this;
- }
- @Override
- public String toString() {
- StringBuilder buffer = new StringBuilder();
- for (int i = 0; i < fields.length; i++) {
- buffer.append(fields[i].toString());
- if ((i+1) < fields.length)
- buffer.append(',');
- }
- return buffer.toString();
- }
- /** Returns true if <code>o</code> is equal to this. */
- @Override
- public boolean equals(Object o) {
- if (this == o) return true;
- if (!(o instanceof Sort)) return false;
- final Sort other = (Sort)o;
- return Arrays.equals(this.fields, other.fields);
- }
- /** Returns a hash code value for this object. */
- @Override
- public int hashCode() {
- return 0x45aaf665 + Arrays.hashCode(fields);
- }
- /** Returns true if the relevance score is needed to sort documents. */
- public boolean needsScores() {
- for (SortField sortField : fields) {
- if (sortField.needsScores()) {
- return true;
- }
- }
- return false;
- }
- }
首先定义了两个静态常量:
public static final Sort RELEVANCE = new Sort();
public static final Sort INDEXORDER = new Sort(SortField.FIELD_DOC);
RELEVANCE 表示按评分排序,
INDEXORDER 表示按文档索引排序,什么叫按文档索引排序?意思是按照索引文档的docId排序,我们在创建索引文档的时候,Lucene默认会帮我们自动加一个Field(docId),如果你没有修改默认的排序行为,默认是先按照索引文档相关性评分降序排序(如果你开启了对索引文档打分功能的话),然后如果两个文档评分相同,再按照索引文档id升序排列。
然后就是Sort的构造函数,你需要提供一个SortField对象,其中有一个构造函数要引起你们的注意:
- public Sort(SortField... fields) {
- setSort(fields);
- }
SortField... fields写法是JDK7引入的新语法,类似于以前的SortField[] fields写法,但它又于以前的这种写法有点不同,它支持field1,field2,field3,field4,field5,.........fieldN这种方式传参,当然你也可以传入一个数组也是可以的。其实我是想说Sort支持传入多个SortField即表示Sort是支持按多个域进行排序,就像SQL里的order by age,id,哦-哦,TM又扯远了,那接着去观摩下SoreField的源码,看看究竟:
- public class SortField {
- /**
- * Specifies the type of the terms to be sorted, or special types such as CUSTOM
- */
- public static enum Type {
- /** Sort by document score (relevance). Sort values are Float and higher
- * values are at the front. */
- SCORE,
- /** Sort by document number (index order). Sort values are Integer and lower
- * values are at the front. */
- DOC,
- /** Sort using term values as Strings. Sort values are String and lower
- * values are at the front. */
- STRING,
- /** Sort using term values as encoded Integers. Sort values are Integer and
- * lower values are at the front. */
- INT,
- /** Sort using term values as encoded Floats. Sort values are Float and
- * lower values are at the front. */
- FLOAT,
- /** Sort using term values as encoded Longs. Sort values are Long and
- * lower values are at the front. */
- LONG,
- /** Sort using term values as encoded Doubles. Sort values are Double and
- * lower values are at the front. */
- DOUBLE,
- /** Sort using a custom Comparator. Sort values are any Comparable and
- * sorting is done according to natural order. */
- CUSTOM,
- /** Sort using term values as Strings, but comparing by
- * value (using String.compareTo) for all comparisons.
- * This is typically slower than {@link #STRING}, which
- * uses ordinals to do the sorting. */
- STRING_VAL,
- /** Sort use byte[] index values. */
- BYTES,
- /** Force rewriting of SortField using {@link SortField#rewrite(IndexSearcher)}
- * before it can be used for sorting */
- REWRITEABLE
- }
- /** Represents sorting by document score (relevance). */
- public static final SortField FIELD_SCORE = new SortField(null, Type.SCORE);
- /** Represents sorting by document number (index order). */
- public static final SortField FIELD_DOC = new SortField(null, Type.DOC);
- private String field;
- private Type type; // defaults to determining type dynamically
- boolean reverse = false; // defaults to natural order
- // Used for CUSTOM sort
- private FieldComparatorSource comparatorSource;
首先你看到的里面定义了一个排序规则类型的枚举Type,
SCORE:表示按评分排序,默认是降序
DOC:按文档ID排序,除了评分默认是降序以外,其他默认都是升序
STRING:表示把域的值转成字符串进行排序,
STRING_VAL也是把域的值转成字符串进行排序,不过比较的时候是调用String.compareTo来比较的
STRING_VAL性能比STRING要差,STRING是怎么比较的,源码里没有说明。
相应的还有INT,FLOAT,DOUBLE,LONG就不多说了,
CUSTOM:表示自定义排序,这个是要结合下面的成员变量
private FieldComparatorSource comparatorSource;一起使用,即指定一个自己的自定义的比较器,通过自己的比较器来决定排序顺序。
SortField还有3个比较重要的成员变量,除了刚才的说自定义比较器外:
- private String field;
- private Type type; // defaults to determining type dynamically
- boolean reverse = false; // defaults to natural order
毫无疑问,Field表示你要对哪个域进行排序,即排序域名称
Type即上面解释过的排序规则即按什么来排序,评分 or docID 等等
reverse表示是否反转默认的排序行为,即升序变降序,降序就变升序,比如默认评分是降序的,reverse设为true,则默认评分就按升序排序了,而其他域就按升序排序了。默认reverse为false.
OK,了解以上内容,我想大家已经对如何实现自己对索引文档的自定义排序已经了然于胸了。下面我把我写的测试demo代码贴出来供大家参考:
首先创建用于测试的索引文档:
- package com.yida.framework.lucene5.sort;
- import java.io.File;
- import java.io.FileInputStream;
- import java.io.IOException;
- import java.nio.file.Paths;
- import java.text.ParseException;
- import java.util.ArrayList;
- import java.util.Date;
- import java.util.List;
- import java.util.Properties;
- import org.apache.lucene.analysis.Analyzer;
- import org.apache.lucene.analysis.standard.StandardAnalyzer;
- import org.apache.lucene.document.DateTools;
- import org.apache.lucene.document.Document;
- import org.apache.lucene.document.Field;
- import org.apache.lucene.document.IntField;
- import org.apache.lucene.document.NumericDocValuesField;
- import org.apache.lucene.document.SortedDocValuesField;
- import org.apache.lucene.document.SortedNumericDocValuesField;
- import org.apache.lucene.document.StringField;
- import org.apache.lucene.document.TextField;
- import org.apache.lucene.index.IndexWriter;
- import org.apache.lucene.index.IndexWriterConfig;
- import org.apache.lucene.index.IndexWriterConfig.OpenMode;
- import org.apache.lucene.store.Directory;
- import org.apache.lucene.store.FSDirectory;
- import org.apache.lucene.util.BytesRef;
- /**
- * 创建测试索引
- * @author Lanxiaowei
- *
- */
- public class CreateTestIndex {
- public static void main(String[] args) throws IOException {
- String dataDir = "C:/data";
- String indexDir = "C:/lucenedir";
- Directory dir = FSDirectory.open(Paths.get(indexDir));
- Analyzer analyzer = new StandardAnalyzer();
- IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
- indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);
- IndexWriter writer = new IndexWriter(dir, indexWriterConfig);
- List<File> results = new ArrayList<File>();
- findFiles(results, new File(dataDir));
- System.out.println(results.size() + " books to index");
- for (File file : results) {
- Document doc = getDocument(dataDir, file);
- writer.addDocument(doc);
- }
- writer.close();
- dir.close();
- }
- /**
- * 查找指定目录下的所有properties文件
- *
- * @param result
- * @param dir
- */
- private static void findFiles(List<File> result, File dir) {
- for (File file : dir.listFiles()) {
- if (file.getName().endsWith(".properties")) {
- result.add(file);
- } else if (file.isDirectory()) {
- findFiles(result, file);
- }
- }
- }
- /**
- * 读取properties文件生成Document
- *
- * @param rootDir
- * @param file
- * @return
- * @throws IOException
- */
- public static Document getDocument(String rootDir, File file)
- throws IOException {
- Properties props = new Properties();
- props.load(new FileInputStream(file));
- Document doc = new Document();
- String category = file.getParent().substring(rootDir.length());
- category = category.replace(File.separatorChar, '/');
- String isbn = props.getProperty("isbn");
- String title = props.getProperty("title");
- String author = props.getProperty("author");
- String url = props.getProperty("url");
- String subject = props.getProperty("subject");
- String pubmonth = props.getProperty("pubmonth");
- System.out.println("title:" + title + "\n" + "author:" + author + "\n" + "subject:" + subject + "\n"
- + "pubmonth:" + pubmonth + "\n" + "category:" + category + "\n---------");
- doc.add(new StringField("isbn", isbn, Field.Store.YES));
- doc.add(new StringField("category", category, Field.Store.YES));
- doc.add(new SortedDocValuesField("category", new BytesRef(category)));
- doc.add(new TextField("title", title, Field.Store.YES));
- doc.add(new Field("title2", title.toLowerCase(), Field.Store.YES,
- Field.Index.NOT_ANALYZED_NO_NORMS,
- Field.TermVector.WITH_POSITIONS_OFFSETS));
- String[] authors = author.split(",");
- for (String a : authors) {
- doc.add(new Field("author", a, Field.Store.YES,
- Field.Index.NOT_ANALYZED,
- Field.TermVector.WITH_POSITIONS_OFFSETS));
- }
- doc.add(new Field("url", url, Field.Store.YES,
- Field.Index.NOT_ANALYZED_NO_NORMS));
- doc.add(new Field("subject", subject, Field.Store.YES,
- Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
- doc.add(new IntField("pubmonth", Integer.parseInt(pubmonth),
- Field.Store.YES));
- doc.add(new NumericDocValuesField("pubmonth", Integer.parseInt(pubmonth)));
- Date d = null;
- try {
- d = DateTools.stringToDate(pubmonth);
- } catch (ParseException pe) {
- throw new RuntimeException(pe);
- }
- doc.add(new IntField("pubmonthAsDay",
- (int) (d.getTime() / (1000 * 3600 * 24)), Field.Store.YES));
- for (String text : new String[] { title, subject, author, category }) {
- doc.add(new Field("contents", text, Field.Store.NO,
- Field.Index.ANALYZED,
- Field.TermVector.WITH_POSITIONS_OFFSETS));
- }
- return doc;
- }
- }
不要问我为什么上面创建索引还要用已经提示快要被废弃了的Field类呢,我会告诉你:我任性!!!不要在意那些细节,我只是想变着花样玩玩。其实就是读取data文件夹下的所有properties文件然后读取文件里的数据写入索引。我待会儿会在底下附件里上传测试用的properties数据文件。
然后就是编写测试类进行测试:
- package com.yida.framework.lucene5.sort;
- import java.io.IOException;
- import java.io.PrintStream;
- import java.nio.file.Paths;
- import java.text.DecimalFormat;
- import org.apache.commons.lang.StringUtils;
- import org.apache.lucene.analysis.standard.StandardAnalyzer;
- import org.apache.lucene.document.Document;
- import org.apache.lucene.index.DirectoryReader;
- import org.apache.lucene.index.IndexReader;
- import org.apache.lucene.queryparser.classic.QueryParser;
- import org.apache.lucene.search.BooleanClause;
- import org.apache.lucene.search.BooleanQuery;
- import org.apache.lucene.search.IndexSearcher;
- import org.apache.lucene.search.MatchAllDocsQuery;
- import org.apache.lucene.search.Query;
- import org.apache.lucene.search.ScoreDoc;
- import org.apache.lucene.search.Sort;
- import org.apache.lucene.search.SortField;
- import org.apache.lucene.search.SortField.Type;
- import org.apache.lucene.search.TopDocs;
- import org.apache.lucene.store.Directory;
- import org.apache.lucene.store.FSDirectory;
- public class SortingExample {
- private Directory directory;
- public SortingExample(Directory directory) {
- this.directory = directory;
- }
- public void displayResults(Query query, Sort sort)
- throws IOException {
- IndexReader reader = DirectoryReader.open(directory);
- IndexSearcher searcher = new IndexSearcher(reader);
- //searcher.setDefaultFieldSortScoring(true, false);
- //Lucene5.x把是否评分的两个参数放到方法入参里来进行设置
- //searcher.search(query, filter, n, sort, doDocScores, doMaxScore);
- TopDocs results = searcher.search(query, null,
- 20, sort,true,false);
- System.out.println("\nResults for: " +
- query.toString() + " sorted by " + sort);
- System.out
- .println(StringUtils.rightPad("Title", 30)
- + StringUtils.rightPad("pubmonth", 10)
- + StringUtils.center("id", 4)
- + StringUtils.center("score", 15));
- PrintStream out = new PrintStream(System.out, true, "UTF-8");
- DecimalFormat scoreFormatter = new DecimalFormat("0.######");
- for (ScoreDoc sd : results.scoreDocs) {
- int docID = sd.doc;
- float score = sd.score;
- Document doc = searcher.doc(docID);
- out.println(StringUtils.rightPad(
- StringUtils.abbreviate(doc.get("title"), 29), 30) +
- StringUtils.rightPad(doc.get("pubmonth"), 10) +
- StringUtils.center("" + docID, 4) +
- StringUtils.leftPad(
- scoreFormatter.format(score), 12));
- out.println(" " + doc.get("category"));
- // out.println(searcher.explain(query, docID));
- }
- System.out.println("\n**************************************\n");
- reader.close();
- }
- public static void main(String[] args) throws Exception {
- String indexdir = "C:/lucenedir";
- Query allBooks = new MatchAllDocsQuery();
- QueryParser parser = new QueryParser("contents",new StandardAnalyzer());
- BooleanQuery query = new BooleanQuery();
- query.add(allBooks, BooleanClause.Occur.SHOULD);
- query.add(parser.parse("java OR action"), BooleanClause.Occur.SHOULD);
- Directory directory = FSDirectory.open(Paths.get(indexdir));
- SortingExample example = new SortingExample(directory);
- example.displayResults(query, Sort.RELEVANCE);
- example.displayResults(query, Sort.INDEXORDER);
- example.displayResults(query, new Sort(new SortField("category",
- Type.STRING)));
- example.displayResults(query, new Sort(new SortField("pubmonth",
- Type.INT, true)));
- example.displayResults(query, new Sort(new SortField("category",
- Type.STRING), SortField.FIELD_SCORE, new SortField(
- "pubmonth", Type.INT, true)));
- example.displayResults(query, new Sort(new SortField[] {
- SortField.FIELD_SCORE,
- new SortField("category", Type.STRING) }));
- directory.close();
- }
- }
理解清楚了我上面说的那些知识点,我想这些测试代码你们应该看得懂,不过我还是要提醒一点,在new Sort对象时,可以传入多个SortField来支持多域排序,比如:
- new Sort(new SortField("category",
- Type.STRING), SortField.FIELD_SCORE, new SortField(
- "pubmonth", Type.INT, true))
表示先按category域按字符串升序排,再按评分降序排,接着按pubmonth域进行数字比较后降序排,一句话,域的排序顺序跟你StoreField定义的先后顺序保持一致。注意Sort的默认排序行为。
下面是运行后的打印结果,你们请对照这打印结构和代码多理解酝酿下吧:
- Results for: *:* (contents:java contents:action) sorted by <score>
- Title pubmonth id score
- Ant in Action 200707 6 1.052735
- /technology/computers/programming
- Lucene in Action, Second E... 201005 9 1.052735
- /technology/computers/programming
- Tapestry in Action 200403 11 0.447534
- /technology/computers/programming
- JUnit in Action, Second Ed... 201005 8 0.429442
- /technology/computers/programming
- A Modern Art of Education 200403 0 0.151398
- /education/pedagogy
- Imperial Secrets of Health... 199903 1 0.151398
- /health/alternative/chinese
- Lipitor, Thief of Memory 200611 2 0.151398
- /health
- Nudge: Improving Decisions... 200804 3 0.151398
- /health
- Tao Te Ching 道德經 200609 4 0.151398
- /philosophy/eastern
- Gödel, Escher, Bach: an Et... 199905 5 0.151398
- /technology/computers/ai
- Mindstorms: Children, Comp... 199307 7 0.151398
- /technology/computers/programming/education
- Extreme Programming Explained 200411 10 0.151398
- /technology/computers/programming/methodology
- The Pragmatic Programmer 199910 12 0.151398
- /technology/computers/programming
- **************************************
- Results for: *:* (contents:java contents:action) sorted by <doc>
- Title pubmonth id score
- A Modern Art of Education 200403 0 0.151398
- /education/pedagogy
- Imperial Secrets of Health... 199903 1 0.151398
- /health/alternative/chinese
- Lipitor, Thief of Memory 200611 2 0.151398
- /health
- Nudge: Improving Decisions... 200804 3 0.151398
- /health
- Tao Te Ching 道德經 200609 4 0.151398
- /philosophy/eastern
- Gödel, Escher, Bach: an Et... 199905 5 0.151398
- /technology/computers/ai
- Ant in Action 200707 6 1.052735
- /technology/computers/programming
- Mindstorms: Children, Comp... 199307 7 0.151398
- /technology/computers/programming/education
- JUnit in Action, Second Ed... 201005 8 0.429442
- /technology/computers/programming
- Lucene in Action, Second E... 201005 9 1.052735
- /technology/computers/programming
- Extreme Programming Explained 200411 10 0.151398
- /technology/computers/programming/methodology
- Tapestry in Action 200403 11 0.447534
- /technology/computers/programming
- The Pragmatic Programmer 199910 12 0.151398
- /technology/computers/programming
- **************************************
- Results for: *:* (contents:java contents:action) sorted by <string: "category">
- Title pubmonth id score
- A Modern Art of Education 200403 0 0.151398
- /education/pedagogy
- Lipitor, Thief of Memory 200611 2 0.151398
- /health
- Nudge: Improving Decisions... 200804 3 0.151398
- /health
- Imperial Secrets of Health... 199903 1 0.151398
- /health/alternative/chinese
- Tao Te Ching 道德經 200609 4 0.151398
- /philosophy/eastern
- Gödel, Escher, Bach: an Et... 199905 5 0.151398
- /technology/computers/ai
- Ant in Action 200707 6 1.052735
- /technology/computers/programming
- JUnit in Action, Second Ed... 201005 8 0.429442
- /technology/computers/programming
- Lucene in Action, Second E... 201005 9 1.052735
- /technology/computers/programming
- Tapestry in Action 200403 11 0.447534
- /technology/computers/programming
- The Pragmatic Programmer 199910 12 0.151398
- /technology/computers/programming
- Mindstorms: Children, Comp... 199307 7 0.151398
- /technology/computers/programming/education
- Extreme Programming Explained 200411 10 0.151398
- /technology/computers/programming/methodology
- **************************************
- Results for: *:* (contents:java contents:action) sorted by <int: "pubmonth">!
- Title pubmonth id score
- JUnit in Action, Second Ed... 201005 8 0.429442
- /technology/computers/programming
- Lucene in Action, Second E... 201005 9 1.052735
- /technology/computers/programming
- Nudge: Improving Decisions... 200804 3 0.151398
- /health
- Ant in Action 200707 6 1.052735
- /technology/computers/programming
- Lipitor, Thief of Memory 200611 2 0.151398
- /health
- Tao Te Ching 道德經 200609 4 0.151398
- /philosophy/eastern
- Extreme Programming Explained 200411 10 0.151398
- /technology/computers/programming/methodology
- A Modern Art of Education 200403 0 0.151398
- /education/pedagogy
- Tapestry in Action 200403 11 0.447534
- /technology/computers/programming
- The Pragmatic Programmer 199910 12 0.151398
- /technology/computers/programming
- Gödel, Escher, Bach: an Et... 199905 5 0.151398
- /technology/computers/ai
- Imperial Secrets of Health... 199903 1 0.151398
- /health/alternative/chinese
- Mindstorms: Children, Comp... 199307 7 0.151398
- /technology/computers/programming/education
- **************************************
- Results for: *:* (contents:java contents:action) sorted by <string: "category">,<score>,<int: "pubmonth">!
- Title pubmonth id score
- A Modern Art of Education 200403 0 0.151398
- /education/pedagogy
- Nudge: Improving Decisions... 200804 3 0.151398
- /health
- Lipitor, Thief of Memory 200611 2 0.151398
- /health
- Imperial Secrets of Health... 199903 1 0.151398
- /health/alternative/chinese
- Tao Te Ching 道德經 200609 4 0.151398
- /philosophy/eastern
- Gödel, Escher, Bach: an Et... 199905 5 0.151398
- /technology/computers/ai
- Lucene in Action, Second E... 201005 9 1.052735
- /technology/computers/programming
- Ant in Action 200707 6 1.052735
- /technology/computers/programming
- Tapestry in Action 200403 11 0.447534
- /technology/computers/programming
- JUnit in Action, Second Ed... 201005 8 0.429442
- /technology/computers/programming
- The Pragmatic Programmer 199910 12 0.151398
- /technology/computers/programming
- Mindstorms: Children, Comp... 199307 7 0.151398
- /technology/computers/programming/education
- Extreme Programming Explained 200411 10 0.151398
- /technology/computers/programming/methodology
- **************************************
- Results for: *:* (contents:java contents:action) sorted by <score>,<string: "category">
- Title pubmonth id score
- Ant in Action 200707 6 1.052735
- /technology/computers/programming
- Lucene in Action, Second E... 201005 9 1.052735
- /technology/computers/programming
- Tapestry in Action 200403 11 0.447534
- /technology/computers/programming
- JUnit in Action, Second Ed... 201005 8 0.429442
- /technology/computers/programming
- A Modern Art of Education 200403 0 0.151398
- /education/pedagogy
- Lipitor, Thief of Memory 200611 2 0.151398
- /health
- Nudge: Improving Decisions... 200804 3 0.151398
- /health
- Imperial Secrets of Health... 199903 1 0.151398
- /health/alternative/chinese
- Tao Te Ching 道德經 200609 4 0.151398
- /philosophy/eastern
- Gödel, Escher, Bach: an Et... 199905 5 0.151398
- /technology/computers/ai
- The Pragmatic Programmer 199910 12 0.151398
- /technology/computers/programming
- Mindstorms: Children, Comp... 199307 7 0.151398
- /technology/computers/programming/education
- Extreme Programming Explained 200411 10 0.151398
- /technology/computers/programming/methodology
- **************************************
写的比较匆忙,如果有哪里没有说清楚或说的不对的,请尽情的喷我,谢谢!
demo源码我也会上传到底下的附件里,你们运行测试类的时候,记得把测试用的数据文件copy到C盘下,如图:
OK,打完收工!
如果你还有什么问题请加我Q-Q:7-3-6-0-3-1-3-0-5,
或者加裙
一起交流学习!
转载:http://iamyida.iteye.com/blog/2197839