0
votes

I am using ElasticSearch and Lucene with the standard analyzer. I want to make my index not return results for "Paleontology" when the query is "Paleo". I do however want it to return results for "Paleolithitic" which is related to "Paleo". In other words, I want the analyzer to be more intelligent, and to filter out stems that are not related to the keyword, while keeping the stems that are related to it. What solutions do I have available?

1
Why can't you just query the one you want?MITjanitor
The user is querying for "Paleo", and my index is returning results for "Paleontology", which is not what I want. How can I make Lucene/ES smarter?Henley
In Lucene sytax you could do somehting like Paleo* NOT "Paleontology"MITjanitor
Great, that's a bit useful.. hmm.. but is there a way to make Lucene ignore certain stems that are not related to the query at all automatically?Henley
Can you describe how you are searching? Is the a prefix query, or are you using a StemFilter, or what? Also, are you intending to manually define rules like this?femtoRgon

1 Answers

0
votes

Implement your own stemming filter (or extend an existing one). The standard analyzer doesn't use stemming, so I'm not sure which exact stemmer you're using. Though, here is the PorterStemmer in Lucene.

http://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/en/PorterStemFilter.html

If this seems too complex, you could put a StopWord filter after you're stemmer and just reject the token you want.