The closer the better approach in Lucene

Question

I'm pretty new to Lucene, so forgive me in advance if some of my terminology is wrong.

Lucene offers different types of fields (keyword, text, unstored, unindexed), but it seems it also supports Numeric field, Int field and Float field.

Now, I'm wondering if "the closer the better" functionality exists/or is easy to implement in Lucene:

I want the creation_date of a document stored as the unix time into a float field. Then I want to be able to compare the unix time given in a query with the indexed unix time of the documents.

Instead of a range query (which checks if the range is between particular bounds) or a boolean query (which checks if the values are the same) I want to be able to return a sense of similarity based on the time between the unix times. If the timespan is small it should end up with a higher score than if the timespan is large. Preferably this shouldn't happen linear but instead exponentially for example. So as the title of this question says: The closer, the better.

I've noticed that ElasticSearch, which uses Lucene as core offers decay function scores, is this the behaviour that I'm looking for and is this present in Lucene?

Lastly, I'm wondering: can one compare this 'type' of scoring together with the default tf-idf scoring that is used to query the body of the documents, in a way that the final score is a combination of the score of the timespan between the documents and the textual similarity of the bodies.

Mark Stroeven Mark Stroeven · Accepted Answer · 2015-09-22T13:31:32

I dont think you get it out of the box like elastic search. You could always try to add it yourself as a module. These algorithms are available at large on the internet.

You could also use the boosting and negative boosting systems in lucene in combination with the exisiting ranking system to experiment if that gives you the sort of results you would want. I am doing that on apache SOLR and it's working like a charm :)

on your last point, tf-idf module is available in solr, if not already in lucene just copy it from solr and add it as module in lucene and combine your own module with the tf-idf module to achieve a combined result.

The closer the better approach in Lucene

1 Answers