Lucene Scoring Function - bias towards shorter documents

Question

I want Lucene Scoring function to have no bias based on the length of the document. This is really a follow up question to Calculate the score only based on the documents have more occurance of term in lucene

I was wondering how Field.setOmitNorms(true) works? I see that there are two factors that make short documents get a high score:

I was wondering - if I wanted no bias towards shorter documents, is Field.setOmitNorms(true) enough?

Look into custom Similarity implementations (derive from DefaultSimilarity and override LengthNorm, Tf, Idf and other functions used for score calculations), it may help you to understand the process further. — sisve
We had the same effect and it worked well with Field.setOmitNorms(true) setting the similarity to searcher.setSimilarity(new DefaultSimilarity() { @Override public float tf(float freq) { return 1; } }); this switched off counting terms and taking document length into account. — fricke

Guillaume Malartre Guillaume Malartre · Accepted Answer · 2017-06-05T18:04:56

Using BM25Similarity you could reduce to 0f:

@param b Controls to what degree document length normalizes tf values

or

@param k1 Controls non-linear term frequency normalization (saturation).

Both params will affect SimWeight

indexSearcher.setSimilarity(new BM25Similarity(1.2f,0f));