Multiple words query in Lucene

Question

For example: There is a column "description" in a Lucene document. Let's say the content of "description" is [hello foo bar]. I want a query [hello f], then the document should be hit, [hello ff] or [hello b] should not be hit.

I use the programmatic way to create the Query, such as PrefixQuery, TermQuery were added to BooleanQuery, but they don't work as expected. StandardAnalyzer is used.

Test cases:

a): new PrefixQuery(new Term("description", "hello f")) -> 0 hit

b): PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f*") ) -> 0 hit

c): PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f") ) -> 0 hit

Any recommendations? Thanks!

What have you tried ? Can you show some code snippets ? This will help us to understand your problem much better . — The Dark Knight
Have you tried using org.apache.lucene.queryParser.QueryParse to parse a query string such as "description: hello AND description: f*"? — pabrantes
@pabrantes "description: hello AND description: f*" is not expected, I want "hello" followed by "f". — 卢声远 Shengyuan Lu

Adam Dyga Adam Dyga · Accepted Answer · 2012-12-17T10:09:48

It doesn't work because you are passing multiple terms to one Term object . If you want all your search words to be prefix-found, you need to :

Tokenize the input string with your analyzer, it will split your search text "hello f" to "hello" and "f":

TokenStream tokenStream = analyzer.tokenStream(null, new StringReader(searchText)); CharTermAttribute termAttribute = tokenStream.getAttribute(CharTermAttribute.class);

List tokens = new ArrayList(); while (tokenStream.incrementToken()) { tokens.add(termAttribute.toString()); }
Put each token into Term object which in turn needs to be put in PrefixQuery and all PrefixQueries to BooleanQuery

EDIT: For example like this:

BooleanQuery booleanQuery = new BooleanQuery();

for(String token : tokens) {        
    booleanQuery.add(new PrefixQuery(new Term(fieldName, token)),  Occur.MUST);
}

Multiple words query in Lucene

2 Answers