开发者社区> 问答> 正文

带有排序规则的校对搜索排序

我将Hibernate搜索从4.3.0.Final版本升级到了最新的稳定版本-5.4.12.Final。除排序挪威语单词外,其他所有内容都很好。在旧版本的hibernate中,构造函数中存在带有语言环境的SortField:

/** Creates a sort, possibly in reverse, by terms in the given field sorted
  * according to the given locale.
  * @param field  Name of field to sort by, cannot be <code>null</code>.
  * @param locale Locale of values in the field.
  */
 public SortField (String field, Locale locale, boolean reverse) {
   initFieldType(field, STRING);
   this.locale = locale;
   this.reverse = reverse;
 }

但是在新的休眠搜索中,SortField没有语言环境。根据休眠参考文档https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_analysis 中的外文排序单词,我们应该将Collat​​ionKeyFilterFactory与规范化器一起使用。但是在此版本的休眠搜索中没有此类。Maven pom:

<dependency>
   <groupId>org.hibernate</groupId>
   <artifactId>hibernate-search-orm</artifactId>
   <version>5.11.5.Final</version>
</dependency>

问题:在休眠的挪威语排序搜索中应该使用/创建什么?

现在我有这样的排序顺序:

atest,btest,ctest,ztest,åtest,ætest,øtest

正确的顺序:

atest,btest,ctest,ztest,ætest,øtest,åtest

有Collat​​ionKeyAnalyzer类,但是我不知道如何使用它进行排序:

public final class CollationKeyAnalyzer extends Analyzer {
  private final CollationAttributeFactory factory;

  /**
   * Create a new CollationKeyAnalyzer, using the specified collator.
   *
   * @param collator CollationKey generator
   */
  public CollationKeyAnalyzer(Collator collator) {
    this.factory = new CollationAttributeFactory(collator);
  }

  @Override
  protected TokenStreamComponents createComponents(String fieldName) {
    KeywordTokenizer tokenizer = new KeywordTokenizer(factory, KeywordTokenizer.DEFAULT_BUFFER_SIZE);
    return new TokenStreamComponents(tokenizer, tokenizer);
  }
}

问题来源:Stack Overflow

展开
收起
montos 2020-03-22 20:01:59 948 0
1 条回答
写回答
取消 提交回答
  • 为了解决排序问题,我创建了自己的NorwegianCollat​​ionFactory。这不是完美的解决方案,因为我从旧版本的Hibernate Search(IndexableBinaryStringTools.class)复制了代码,但效果很好。 NorwegianCollat​​ionFactory类:

    import org.apache.lucene.analysis.TokenStream;
    import org.apache.lucene.analysis.util.TokenFilterFactory;
    
    import java.text.Collator;
    import java.util.Locale;
    import java.util.Map;
    
    public final class NorwegianCollationFactory extends TokenFilterFactory {
    
      public NorwegianCollationFactory(Map<String, String> args) {
          super(args);
      }
    
      @Override
      public TokenStream create(TokenStream input) {
          Collator norwegianCollator = Collator.getInstance(new Locale("no", "NO"));
          return new CollationKeyFilter(input, norwegianCollator);
      }
    
    }
    

    Collat​​ionKeyFilter类:

    import org.apache.lucene.analysis.TokenFilter;
    import org.apache.lucene.analysis.TokenStream;
    import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
    
    import java.io.IOException;
    import java.text.Collator;
    import java.util.Objects;
    
    public final class CollationKeyFilter extends TokenFilter {
    
      // This code is copied from IndexableBinaryStringTools.class from the old version of hibernate search  4.3.0.Final
      private static final CollationKeyFilter.CodingCase[] CODING_CASES = {
              new CollationKeyFilter.CodingCase(7, 1),
              new CollationKeyFilter.CodingCase(14, 6, 2),
              new CollationKeyFilter.CodingCase(13, 5, 3),
              new CollationKeyFilter.CodingCase(12, 4, 4),
              new CollationKeyFilter.CodingCase(11, 3, 5),
              new CollationKeyFilter.CodingCase(10, 2, 6),
              new CollationKeyFilter.CodingCase(9, 1, 7),
              new CollationKeyFilter.CodingCase(8, 0)
      };
    
      private final Collator collator;
      private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
    
      public CollationKeyFilter(TokenStream input, Collator collator) {
          super(input);
          this.collator = (Collator) collator.clone();
      }
    
      @Override
      public boolean incrementToken() throws IOException {
          if (input.incrementToken()) {
              byte[] collationKey = collator.getCollationKey(termAtt.toString()).toByteArray();
              int encodedLength = getBinaryStringEncodedLength(collationKey.length);
              termAtt.resizeBuffer(encodedLength);
              termAtt.setLength(encodedLength);
              encodeToBinaryString(collationKey, collationKey.length, termAtt.buffer());
              return true;
          } else {
              return false;
          }
      }
    
      // This code is copied from IndexableBinaryStringTools class from the old version of hibernate search  4.3.0.Final
      private void encodeToBinaryString(byte[] inputArray, int inputLength, char[] outputArray) {
          if (inputLength > 0) {
              int inputByteNum = 0;
              int caseNum = 0;
              int outputCharNum = 0;
              CollationKeyFilter.CodingCase codingCase;
              for (; inputByteNum + CODING_CASES[caseNum].numBytes <= inputLength; ++outputCharNum) {
                  codingCase = CODING_CASES[caseNum];
                  if (codingCase.numBytes == 2) {
                      outputArray[outputCharNum] = (char) (((inputArray[inputByteNum] & 0xFF) << codingCase.initialShift)
                              + (((inputArray[inputByteNum + 1] & 0xFF) >>> codingCase.finalShift) & codingCase.finalMask) & (short) 0x7FFF);
                  } else {
                      outputArray[outputCharNum] = (char) (((inputArray[inputByteNum] & 0xFF) << codingCase.initialShift)
                              + ((inputArray[inputByteNum + 1] & 0xFF) << codingCase.middleShift)
                              + (((inputArray[inputByteNum + 2] & 0xFF) >>> codingCase.finalShift) & codingCase.finalMask) & (short) 0x7FFF);
                  }
                  inputByteNum += codingCase.advanceBytes;
                  if (++caseNum == CODING_CASES.length) {
                      caseNum = 0;
                  }
              }
              codingCase = CODING_CASES[caseNum];
              if (inputByteNum + 1 < inputLength) {
                  outputArray[outputCharNum++] = (char) ((((inputArray[inputByteNum] & 0xFF) << codingCase.initialShift)
                          + ((inputArray[inputByteNum + 1] & 0xFF) << codingCase.middleShift)) & (short) 0x7FFF);
                  outputArray[outputCharNum] = (char) 1;
              } else if (inputByteNum < inputLength) {
                  outputArray[outputCharNum++] = (char) (((inputArray[inputByteNum] & 0xFF) << codingCase.initialShift) & (short) 0x7FFF);
                  outputArray[outputCharNum] = caseNum == 0 ? (char) 1 : (char) 0;
              } else {
                  outputArray[outputCharNum] = (char) 1;
              }
          }
      }
    
      // This code is copied from IndexableBinaryStringTools class from the old version of hibernate search 4.3.0.Final
      private int getBinaryStringEncodedLength(int inputLength) {
          return (int) ((8L * inputLength + 14L) / 15L) + 1;
      }
    
      // This code is copied from IndexableBinaryStringTools class from the old version of hibernate search 4.3.0.Final
      private static class CodingCase {
          int numBytes;
          int initialShift;
          int middleShift;
          int finalShift;
          int advanceBytes = 2;
          short middleMask;
          short finalMask;
    
          CodingCase(int initialShift, int middleShift, int finalShift) {
              this.numBytes = 3;
              this.initialShift = initialShift;
              this.middleShift = middleShift;
              this.finalShift = finalShift;
              this.finalMask = (short) ((short) 0xFF >>> finalShift);
              this.middleMask = (short) ((short) 0xFF << middleShift);
          }
    
          CodingCase(int initialShift, int finalShift) {
              this.numBytes = 2;
              this.initialShift = initialShift;
              this.finalShift = finalShift;
              this.finalMask = (short) ((short) 0xFF >>> finalShift);
              if (finalShift != 0) {
                  advanceBytes = 1;
              }
          }
      }
    
      @Override
      public boolean equals(Object o) {
          if (this == o) {
              return true;
          }
          if (o == null || getClass() != o.getClass()) {
              return false;
          }
          if (!super.equals(o)) {
              return false;
          }
          CollationKeyFilter that = (CollationKeyFilter) o;
          return Objects.equals(collator, that.collator) &&
                  Objects.equals(termAtt, that.termAtt);
      }
    
      @Override
      public int hashCode() {
          return Objects.hash(super.hashCode(), collator, termAtt);
      }
    
    }
    

    实体映射示例:

    @Entity
    @NormalizerDef(name = "textSortNormalizer",
          filters = {
                  @TokenFilterDef(factory = LowerCaseFilterFactory.class),
                  @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
                          @Parameter(name = "pattern", value = "('-&\\.,\\(\\))"),
                          @Parameter(name = "replacement", value = " "),
                          @Parameter(name = "replace", value = "all")
                  }),
                  @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
                          @Parameter(name = "pattern", value = "([^0-9\\p{L} ])"),
                          @Parameter(name = "replacement", value = ""),
                          @Parameter(name = "replace", value = "all")
                  }),
                  @TokenFilterDef(factory = NorwegianCollationFactory.class)
          }
    )
    public class Entity {
    
      @Field(name = "name_for_sort", normalizer = @Normalizer(definition = "textSortNormalizer"))
      @SortableField(forField = "name_for_sort")
      private String name;
    
    }
    

    回答来源:Stack Overflow

    2020-03-22 20:03:24
    赞同 展开评论 打赏
问答分类:
问答地址:
问答排行榜
最热
最新

相关电子书

更多
低代码开发师(初级)实战教程 立即下载
冬季实战营第三期:MySQL数据库进阶实战 立即下载
阿里巴巴DevOps 最佳实践手册 立即下载