0
votes

I'm using Lucene.net within my project to search for customers. I've got my Lucene index built and search is returning expected results for all of my indexed fields, however, when I search specifically for customers in Indiana or Oregon, I receive zero results, despite my database reflecting otherwise.

In my test case, these states are abbreviated to IN and OR respectively in my lucene index. Searching for other fields will yield results for customers within these states, so I know they are indexed.

Example:

State:(fl) returns results for customers in Florida, as expected.
State:(in) returns no results
State:(or) returns no results
State:(ar*) returns results for customers in Arkansas, as expected.
State:(in*) returns no results
State:(or*) returns no results
State:("mi") returns results for customers in Michigan, as expected.
State:("or") returns no results
State:("in") returns no results
State:("\\ca") returns results for customers in California, as expected.
State:("\\or") returns no results
State:("\\in") returns no results

On a related note, searching for names containing AND, OR, and IN work without issue:

Name:(and*) returns results for Andrew, Andrea, Andy, etc.
Name:(in*) returns results for Inge, Ina, Indie, etc.
Name:(or*) returns results for Oris, Orlando, Orville, etc.

I've tried the following for creating my indices:

new Field("State", (String.IsNullOrWhiteSpace(ShippingState) ? "" : ShippingState), Field.Store.YES, Field.Index.ANALYZED);

new Field("State", (String.IsNullOrWhiteSpace(BillingState) ? "" : BillingState), Field.Store.YES, Field.Index.ANALYZED);

new Field("State", (String.IsNullOrWhiteSpace(ShippingState) ? "" : ShippingState) + " " + (String.IsNullOrWhiteSpace(BillingState) ? "" : BillingState), Field.Store.YES, Field.Index.ANALYZED);

I've also looked at other solutions to similar problems, such as how to properly escape OR and AND in lucene query? but I've had no luck in adapting these solutions to this issue. I'm using Lucene.NET 3.0.3.

1

1 Answers

1
votes

The problem here isn't really the collision with query syntax. "IN" isn't even a lucene query keyword.

The problem is that standard analysis eliminates certain common words known as stop words, which are deemed to not usually be interesting search terms. By default, this the stop words are common english words, including "in", "or" and "and", among others (full list here: What is the default list of stopwords used in Lucene's StopFilter?).

If this isn't desirable behavior in your case, you can define your StandardAnalyzer with a custom (or empty) stop word set:

StandardAnalyzer analyzer = new StandardAnalyzer(
    Lucene.Net.Util.Version.LUCENE_30, 
    new HashSet<String>() //Empty stop word set
);