教训
AND OR 之类的请求时候,显示的加括号来 显示控制子查询之间的关系。
AND OR 用于子句之间的链接
+ - NOT 用于子句的修饰
构造的规则大体上是 +(clause) -(clause)或者 +(clause AND clause)-(clause OR clause)
问题背景
Search4realstar 日常,配置lucene queryParser
输入height:165.0 OR weight_num:21 AND shoes_size:37 查无结果
Debugquery 发现解析形式如下:
rawinput height:165.0 OR weight_num:21 AND shoes_size:37
queryinput height:165.0 OR weight_num:21 AND shoes_size:37
parsedquery height:165.0 +weight_num:21 +shoes_size:37 // weight:21子查询被改写了
从而导致 最终执行逻辑和 预设逻辑不一致,查无结果。
eg
raw input height:165.0 AND weight_num:21 AND shoes_size:37<</b>/str>
query input height:165.0 AND weight_num:21 AND shoes_size:37
parsedquery +height:165.0 +weight_num:21 +shoes_size:37 //height:165.0子查询被改写了
eg
raw input height:165.0 AND weight_num:21 OR shoes_size:37<</b>/str>
query input height:165.0 AND weight_num:21 OR shoes_size:37
parsed query +height:165.0 +weight_num:21 shoes_size:37 //height:165.0子查询被改写了
问题追踪
org.apache.lucene.queryParser QueryParser
protected voidaddClause(List clauses, int conj, int mods, Query q) {
boolean required, prohibited;
// If this term is introduced by AND, make the preceding term required,
// unless it's already prohibited
if (clauses.size() > 0 && conj == CONJ_AND) {
BooleanClause c = clauses.get(clauses.size()-1); //这里覆盖了 OR,当遇到AND的时候,会改写前面一个clause链接关系
if (!c.isProhibited())
c.setOccur(BooleanClause.Occur.MUST);
}
进一步分析lucene qp的词法、语法规则
词法规则LL(1)
第一个L :从左到右扫描输入串
第二个L :生成的是最左推导
语法规则
Query ::= ( Clause )*
Clause ::= ["+", "-"] [ ":"] ( | "(" Query ")" )
层次关系伪逻辑表示如下
相关规则和实现参考: 词法、语法分析QueryParser.jj QueryParse.java,
Top层
-->TopLevelQuery
--\-->Query层
--\--\-->Clause 层
--\--\--\-->BooleanClause层
--\--\--\--\-->TermQuery层
参考链接 主要是词法、语法规则解读
http://zhaohe162.blog.163.com/blog/static/382167972011112252210215/
http://zhaohe162.blog.163.com/blog/static/382167972011112252312800/