0
votes

I would like to use Solr to index documents with term weights.

Doc1: this(w=0.3) is(w=0.4) the(w=0.1) first(w=0.7) doc(w=0.2)

Doc2: this(w=0.1) is(w=0.2) the(w=0.5) second(w=0.8) doc(w=0.1)

Note that the weight for the same term can be different for two documents.

After indexing I would like the search function to consider these weights when scoring the documents. For example, if the query is "doc", I would like Doc1 to get a higher score.

Is this possible?

Thanks!

1
Have you seen the Payload Score Parser? Also, see Payoads in Solr from Lucene Solr Revolution in 2017 - you can also find the talk on Youtube iirc.MatsLindh
I tried to use the end-to-end example for Payload but it doesn't work with the latest Solr version 8.5.0. Is there an example doing something similar to what I need above with Payloads that works with Solr 8.5.0? Thanks!elkon
Please expand your question with what you tried and what problems you ran into - and what didn't work according to your expectation, since payloads is probably the easiest way to implement thisMatsLindh
As far as I understand Payloads in of itself cannot serve as the term weights in the ranking process. Some extra code needs to be written to that end. I could only find one example online that does that: lucidworks.com/post/end-to-end-payload-example-in-solr. I tried to compile this code with the latest version of Solr (8.5.0) but it doesn't compile because many functions/classes were deprecated (e.g., DefaultSimilarity). In the example in the question, I'd like the word "this" in Doc1 to be "boosted" by 0.3 and in Doc2 by 0.1 and so on. Thanks!elkon
Isn't that kind of what the Payload Score Parser attempts to do? That's the actual, committed, part of Solr-version of the same thing that you've linked.MatsLindh

1 Answers

1
votes

This was pointed by MatsLindh, thanks!

It can be done using Payloads: https://lucene.apache.org/solr/guide/8_5/other-parsers.html#payload-score-parser

I don't recommend trying to use the example here: https://lucidworks.com/post/end-to-end-payload-example-in-solr/

Here's the solution.

1) Create a new collection:

bin/./solr create -c my_docs -s 1 -rf 2

2) Write this (based on the example) into a CSV file: (1.csv)

id,txt_dpf

1,this|0.3 is|0.4 the|0.1 first|0.7 doc|0.2

2,this|0.1 is|0.2 the|0.5 second|0.8 doc|0.1 `

3) Add the content into the collection:

bin/./post -c my_docs -type text/csv -out yes docs/csv/1.csv

4) query: localhost:8983/solr/my_docs/select?debug=results&fl=txt_dpf,score&q={!payload_score%20f=txt_dpf%20v=this%20func=max%20includeSpanScore=true}

Some important notes:

  1. The name of the field in which the weights are is IMPORTANT! it has to end with "dpf".

  2. Use IncludeSpanScore=true, otherwise your score will just be the weight.

@MatsLindh, thanks again!