Lucene.NET 2.9 - MultiFieldQueryParser, boosted fields, stemming and prefixes

Question

I have a system where the search queries multiple fields with different boost values. It is running on Lucene.NET 2.9.4 because it's an Umbraco (6.x) site, and that's what version of Lucene.NET the CMS uses.

My client asked me if I could add stemming, so I wrote a custom analyzer that does Standard / Lowercase / Stop / PorterStemmer. The stemming filter seems to work fine.

But now, when I try to use my new analyzer with the MultiFieldQueryParser, it's not finding anything.

The MultiFieldQueryParser is returning a query containing stemmed words - e.g. if I search for "the figure", what I get as part of the query it returns is:

keywords:figur^4.0 Title:figur^3.0 Collection:figur^2.0

i.e. it's searching the correct fields and applying the correct boosts, but trying to do an exact search on stemmed terms on indexes that contained unstemmed words.

I think what's actually needed is for the MultiFieldQueryParser to return a list of clauses which are of type PrefixQuery. so it'll output a query like

keywords:figur*^4.0 Title:figur*^3.0 Collection:figur*^2.0

If I try to just add a wildcard to the end of the term, and feed that into the parser, the stemmer doesn't kick in. i.e. it builds a query to look for "figure*".

Is there any way to combine MultiFieldQueryParser boosting and prefix queries?

femtoRgon femtoRgon · Accepted Answer · 2015-01-12T17:56:04

You need to reindex using your custom analyzer. Applying a stemmer only at query time is useless. You might kludge together something using wildcards, but it would remain an ugly, unreliable kludge.

Lucene.NET 2.9 - MultiFieldQueryParser, boosted fields, stemming and prefixes

1 Answers