0
votes

I serialized a BooleanQuery constructed using TermQuery's into a string. Now I am trying to de-serialize the string back into a BooleanQuery on a different node in a distributed system. So while de-serializing, I have multiple fields and I do not want to use an analyzer

Eg : I am trying to parse the below string without analyzing

+contents:maxItemsPerBlock +path:/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java

QueryParser in lucene requires an analyzer, but I want the above field values to be treated as terms. I am looking for a query parser which does something like the below since I do not want to parse the strings and construct the query myself.

TermQuery q1 = new TermQuery(new Term("contents", "maxItemsPerBlock"));
TermQuery q2 = new TermQuery(new Term("path", "/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java"));
BooleanQuery q = new BooleanQuery();
q.add(q1, BooleanClause.Occur.MUST);
q.add(q2, BooleanClause.Occur.MUST);

Also when I tried using a whitespace analyzer with a QueryParser, I got an "IllegalArgumentException : field must not be null" error. Below is the sample code

Analyzer analyzer = new WhitespaceAnalyzer();
String field = "contents";
QueryParser parser = new QueryParser(null, analyzer);
Query query = parser.parse("+contents:maxItemsPerBlock +path:/home/rchallapalli/Desktop/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java");

java.lang.IllegalArgumentException: field must not be null
at org.apache.lucene.search.MultiTermQuery.<init>(MultiTermQuery.java:233)
at org.apache.lucene.search.AutomatonQuery.<init>(AutomatonQuery.java:99)
at org.apache.lucene.search.AutomatonQuery.<init>(AutomatonQuery.java:81)
at org.apache.lucene.search.RegexpQuery.<init>(RegexpQuery.java:108)
at org.apache.lucene.search.RegexpQuery.<init>(RegexpQuery.java:93)
at org.apache.lucene.queryparser.classic.QueryParserBase.newRegexpQuery(QueryParserBase.java:572)
at org.apache.lucene.queryparser.classic.QueryParserBase.getRegexpQuery(QueryParserBase.java:774)
at org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:844)
at org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:348)
at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:247)
at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:202)
at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:160)
at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:117)
1
The error message seems fairly clear: The first arg to the QueryParser ctor can't be null. If you don't care about the field argument, just pass it a garbage field name: new QueryParser("none", analyzer);femtoRgon

1 Answers

1
votes

Considering the text you offer in your question. Maybe WhitespaceAnalyzer which splits tokens at whitespace is a choice.

Before you serialize the BooleanQuery constructed by TermQuery, the term in TermQuery is actually what you want to match in the Lucene Index.

// code in Scala
val parser = new QueryParser(version, "", new WhitespaceAnalyzer((version)))
val parsedQuery = parser.parse(searchString) 

I tried the following two cases: single-value field and multi-valued field, all work.

 +contents:maxItemsPerBlock +path:/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java

 +(contents:maxItemsPerBlock contents:minItemsPerBlock) +path:/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java

Besides, in our system the serialization and deserialization when it comes to Query passing between nodes are based on java's ObjectInputStream and ObjectOutputStream. So you may try in that way so you don't have to consider the Analyzer thing.