I have a field that I am indexing with Lucene like so:
@Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES)
public HungerState getHungerState() {
The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY
When these values are indexed using the StandardAnalyzer
, the terms end up as hungry, slightly
since it tokenizes on punctuation and ignores the "not".
If I change the index to index=Index.UN_TOKENIZED
, the indexed terms are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY
, as expected.
My search API has 1 "search" method that constructs the Query
like so:
MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), new StandardAnalyzer(Version.LUCENE_30));
parser.setDefaultOperater(QueryParser.AND_OPERATOR);
Query query = parser.parse(searchTerms);
This handles searches where searchTerms = "foo", which searches all fields returned by getSearchFields()
on "foo", and also where searchTerms specifies fields and values to search (ie "hungerState:HUNGRY")
My problem is with the latter scenario. Since the query parser is using a StandardAnalyzer, searches for hungerState:SLIGHTLY_HUNGRY
get parsed into hungerState:"slightly hungry"
and searches for hungerState=NOT_HUNGRY
get parsed into hungerState=hungry
.
When the field is indexed using the StandardAnalyzer, I get unexpected results (searches for HUNGRY and NOT_HUNGRY return results for all 3 values). When the field is indexed as UN_TOKENIZED, I don't get any results since the query parser tokenizes the search string and makes it lowercase.
I've even tried specifying an Analyzer for indexing like KeywordAnalyzer
, but it pretty much has no effect since the entire search string is analyzed with StandardAnalyzer
every time.
Any advice would be appreciated. Thanks!