I am trying to build an application that implements a search system over Lucene index. Right now the index is built, I can search for documents over the index and everything seems to be working fine but, when I make a search using a field that is used in many documents, the analyzer only returns some documents. I have tried to make the same search using Luke and is behaving the same way.
i.e: My index have 2 fields:
Field A: An identifier that is unique. Field B: A String.
First Example:
We have 5 documents:
Doc 1: FieldA:1; FieldB:hello world
Doc 2: FieldA:2; FieldB:hello world!
Doc 3: FieldA:3; FieldB:hello world
Doc 4: FieldA:4; FieldB:anything
Doc 5: FieldA:5; FieldB:hello world
When I make a search like "B: hello world" it should returns the documents 1, 3 and 5 but it only returns 1 and 3.
When I make a search like "A: 5" it returns the document 5 and the field B value is "hello world".
Second Example: (one token)
Doc 6: FieldA:6; FieldB:token
Doc 7: FieldA:7; FieldB:token
Doc 8: FieldA:8; FieldB:TOKEN
Doc 9: FieldA:9 FieldB:token
When I search FieldB:"token" it only returns Doc 6 and Doc 9. The only way I can find Doc 7 is searching by its FieldA.
I am using WhitespaceAnalyzer and both Fields are NOT_ANALYZED.
IndexGenerator Main
...
IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);;
writer.setRAMBufferSizeMB(200);
List<Work> works = getWorks(); //Retrieves the information from the DB
for (Work work: works) {
Document luceneDocument = createLuceneDocument(work);
writer.addDocument(luceneDocument);
}
writer.commit();
...
CreateLuceneDocument Method:
private static Document createLuceneDocument(Work work) {
try {
Document luceneDoc = new Document();
...
Field id = new Field("ID", work.getId(),Field.Store.YES,Field.Index.NOT_ANALYZED);
luceneDoc.add(id);
Field name = new Field("NAME", work.getName(),Field.Store.YES,Field.Index.NOT_ANALYZED);
luceneDoc.add(name);
...
return document;
}
catch (LuceneException e) {
...
}
}
I have noticed that the Documents that are not returned have a low score value. Assuming that is a problem when the index is created because Luke behaves the same way than the applicacion, what am I doing wrong?
Thanks in advance!