Does Lucene index use tf-idf as weights? Is it possible to define your own statistics and weights for each document, and "plug" them into Lucene?
0
votes
2 Answers
1
votes
Yes, the default scoring algorithm incorporates tf-idf, and is fully documented in the TFIDFSiilarity documentation.
There are a number of ways to customize the scoring of documents.
- The simplest and most common is to incorporate a boost, either on a field at index time, or on a query term when querying.
- Many query types modify the scoring used for that query. Examples include ConstantScoreQuery and DisjunctionMaxQuery.
- The
Similarity
you use defines the scoring algorithm. You could select a different one (ex. BM25Similarity). - You can implement your own
Similarity
, Usually by extending a higher-level implementation such asDefaultSimilarity
,TFIDFSimilarity
, orSimilarityBase
0
votes
Just go through this example. It may help help you to know how you can bring custom changes in indexing process
http://lucene.apache.org/core/4_3_1/demo/src-html/org/apache/lucene/demo/IndexFiles.html