first of all, sorry for my bad English!
i am new to Lucene Library(Since last Wednesday) and im trying to understand how to get best relevance level of matching documents based on the terms found.
i use Lucene 4.10.0 (no Solr)
I'm able to index/search english/arabic text as well as supporting hit highlighting for these texts.
now i have a Problem with the relevance of search results.
if i search for "Mohammad Omar" in three docs:
doc1.add(new TextField("contents", "xyz abc, 123 Mohammad Abu Omar 123", Field.Store.YES));
indexWriter.addDocument(config.build(taxoWriter, doc1));
doc2 = new Document();
doc2.add(new TextField("contents", "xyz abc, 123 Omar bin Mohammad 123", Field.Store.YES));
indexWriter.addDocument(config.build(taxoWriter, doc2));
doc3 = new Document();
doc3.add(new TextField("contents", "xyz abc, 123 Abu Mohammad Omar 123", Field.Store.YES));
indexWriter.addDocument(config.build(taxoWriter, doc3));
...etc
i get same Score for these 3 docs.
it looks like Lucene ignores the Words Order and just scoring on the Matches Count.
i expect the following as best Results:
doc3 THEN doc1 THEN doc2
but i get:
doc1 THEN doc2 THEN doc3 (ALL HAVE SAME SCORE)
for searching in lowercase and in substrings i use an extended Analyzer like this:
@Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
Tokenizer source = new WhitespaceTokenizer(reader);
TokenStream filter = new LowerCaseFilter(source);
filter = new WordDelimiterFilter(filter,Integer.MAX_VALUE,null);
return new TokenStreamComponents(source, filter);
}
any idea how to perform it?
i see that Boosting Query Terms AND/OR using RegEx could be an Option, but this means, i have to handle User inputs manually. isn't there an "out of box" Solution(like a function, Filter or Analyzer)?
many thanks!