0
votes

Does Lucene index use tf-idf as weights? Is it possible to define your own statistics and weights for each document, and "plug" them into Lucene?

2

2 Answers

1
votes

Yes, the default scoring algorithm incorporates tf-idf, and is fully documented in the TFIDFSiilarity documentation.

There are a number of ways to customize the scoring of documents.

  • The simplest and most common is to incorporate a boost, either on a field at index time, or on a query term when querying.
  • Many query types modify the scoring used for that query. Examples include ConstantScoreQuery and DisjunctionMaxQuery.
  • The Similarity you use defines the scoring algorithm. You could select a different one (ex. BM25Similarity).
  • You can implement your own Similarity, Usually by extending a higher-level implementation such as DefaultSimilarity, TFIDFSimilarity, or SimilarityBase
0
votes

Just go through this example. It may help help you to know how you can bring custom changes in indexing process

http://lucene.apache.org/core/4_3_1/demo/src-html/org/apache/lucene/demo/IndexFiles.html