5
votes

I am new to Lucene.NET. I am adding fields as

Field.Index.NOT_ANALYZED

in a Lucene document. There is one default field which is added in document as

Field.Index.ANALYZED

I have no difficulty in searching the default field; but when I search on a specific field then Lucene returns 0 document. However if I change,

Field.Index.NOT_ANALYZED

to

Field.Index.ANALYZED

then things work properly. I think there is something to do with Analyzer. Can any body guide me on how to search a Field.Index.NOT_ANALYZED field?

Here is how I am creating the query parser:

QueryParser parser = 
    new QueryParser(
        Version.LUCENE_30, 
        "content", 
        new StandardAnalyzer(Version.LUCENE_30));
2
Can you post an example of how you are indexing and what type of values those fields have? And how are you searching the index? Also, any particular reason you are using NOT_ANALYZE for most fields?rae1

2 Answers

14
votes

ANALYZED just means that the value is passed through an Analyzer before being indexed, while NOT_ANALYZED means that the value will be indexed as-is. The later means that a value like "hello world" will be indexed as just exactly that, the string "hello world". However, the syntax for the QueryParser class parses spaces as a term-separator, creating two terms "hello" and "world".

You will be able to match the field if you created a var q = new TermQuery(new Term(field, "hello world")) instead of calling var q = queryParser.Parse(field, "hello world").

2
votes

The issue seems to be using search values that do not match literally the values currently indexed; in other words, trying to match document containing hello world with a search for Hello World. Since all your fields are marker as NOT_ANALYZED Lucene is not processing (using an analyzer/tokenizer) the terms; it is simply indexing as they are passed, storing a string like hello world as hello world. For a search to return a match on that document, the search term needs to be exactly

hello world 

and not, Hello World or hello world. or Hello. All of these searches will return zero matches. For Lucene, it would be like trying to search for the number 3, and get a match for 2 or 4 (as illogical as it might sound).

This is why the use of NOT_ANALYZED is only recommended for ID-type fields where you want the search to return an exact match, not a list of related/similar field values.

The advantage of using ANALYZED is that the search becomes more intuitive and friendly. Indexing a value like hello world will break the term down into tokens (to provide for partial matches like hello or world or even ello) and stored in all-lowercase to avoid mismatches due to different casing (like Hello World or hELLO).