1
votes

We are facing strange issue while querying on Solr. Solr cloud is giving different score then Solr master Set up for same content. Also issue is Solr Cloud is changing this score for same content and same query in different request which is causing different order of document in multiple calls. On Master salve score is fixed and not changing in different calls.

Here are score calculation for same record on master Slave

63372217#83#-2128821991: " 8.439063 = boost(((title:narendra | keywords:narendra) (title:mod | keywords:mod))~1,1.0/(3.16E-11*float(ms(const(1524117881692),date(effectivetriedate)))+1.0)), product of: 9.141734 = sum of: 9.141734 = max of: 9.141734 = weight(title:mod in 10186378) [SchemaSimilarity], result of: 9.141734 = score(doc=10186378,freq=1.0 = termFreq=1.0 ), product of: 9.458362 = idf(docFreq=805, docCount=10322376) 0.96652406 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 6.5560484 = avgFieldLength 7.111111 = fieldLength 8.783037 = weight(keywords:mod in 10186378) [SchemaSimilarity], result of: 8.783037 = score(doc=10186378,freq=1.0 = termFreq=1.0 ), product of: 8.783037 = idf(docFreq=886, docCount=5782333) 1.0 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.0 = parameter b (norms omitted for field) 0.92313594 = 1.0/(3.16E-11*float(ms(const(1524117881692),date(effectivetriedate)=2018-03-19T18:09:00Z))+1.0) ", 60930380#83#-2128833038: " 8.3860035 = boost(((title:narendra | keywords:narendra) (title:mod | keywords:mod))~1,1.0/(3.16E-11*float(ms(const(1524117881692),date(effectivetriedate)))+1.0)), product of: 12.907965 = sum of: 4.1249275 = max of: 4.1249275 = weight(keywords:narendra in 3310267) [SchemaSimilarity], result of: 4.1249275 = score(doc=3310267,freq=1.0 = termFreq=1.0 ), product of: 4.1249275 = idf(docFreq=93469, docCount=5782333) 1.0 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.0 = parameter b (norms omitted for field) 8.783037 = max of: 8.783037 = weight(keywords:mod in 3310267) [SchemaSimilarity], result of: 8.783037 = score(doc=3310267,freq=1.0 = termFreq=1.0 ), product of: 8.783037 = idf(docFreq=886, docCount=5782333) 1.0 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.0 = parameter b (norms omitted for field) 0.6496767 = 1.0/(3.16E-11*float(ms(const(1524117881692),date(effectivetriedate)=2017-10-03T18:02:13Z))+1.0) "

For Cloud set up

,63372217#83#-2128821991=
8.45718 = boost(((title:narendra | keywords:narendra) (title:mod |   keywords:mod))~1,1.0/(3.16E- 11*float(ms(const(1524118417608),date(effectivetriedate)))+1.0)), product of:
  9.161503 = sum of:
9.161503 = max of:
  9.161503 = weight(title:mod in 49446) [SchemaSimilarity], result of:
    9.161503 = score(doc=49446,freq=1.0 = termFreq=1.0
), product of:
      9.522509 = idf(docFreq=298, docCount=4078658)
      0.96208924 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        6.4863324 = avgFieldLength
        7.111111 = fieldLength
  8.878012 = weight(keywords:mod in 49446) [SchemaSimilarity], result of:
    8.878012 = score(doc=49446,freq=1.0 = termFreq=1.0
), product of:
      8.878012 = idf(docFreq=319, docCount=2291617)
      1.0 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.0 = parameter b (norms omitted for field)
0.9231215 = 1.0/(3.16E-11*float(ms(const(1524118417608),date(effectivetriedate)=2018-03-19T18:09:00Z))+1.0)  


63372217#83#-2128821991=
8.499447 = boost(((title:narendra | keywords:narendra) (title:mod | keywords:mod))~1,1.0/(3.16E- 11*float(ms(const(1524118478192),date(effectivetriedate)))+1.0)), product of:
  9.207306 = sum of:
9.207306 = max of:
  9.207306 = weight(title:mod in 90314) [SchemaSimilarity], result of:
    9.207306 = score(doc=90314,freq=1.0 = termFreq=1.0
), product of:
      9.534913 = idf(docFreq=306, docCount=4240239)
      0.96564126 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        6.5421023 = avgFieldLength
        7.111111 = fieldLength
  8.90691 = weight(keywords:mod in 90314) [SchemaSimilarity], result of:
    8.90691 = score(doc=90314,freq=1.0 = termFreq=1.0
), product of:
      8.90691 = idf(docFreq=320, docCount=2366191)
      1.0 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.0 = parameter b (norms omitted for field)  

Please suggest here.

1
Which version of Solr?MatsLindh

1 Answers

1
votes

The default behavior in Solr is that the score is calculated on each shard by itself, before being merged higher up in the hierarchy. This assumes that you have a decent number of documents, and that the documents are spread randomly (and evenly) across your shards regardless of their content. If that's the case, the score should be similar enough to avoid it cause any larger issues.

In your case this is what's creating the different scores - in your "single node answering a query" case (i.e. your master/slave setup), the counts differs from when you have a cluster (SolrCloud) setup - in the cluster setup the documents are distributed across multiple servers and by default, only the local counts are used for scoring for each node. Comparing scores across queries (especially with recency boosts that will change as time passes) is also hard. My guess is that score you have are so close to each other for those documents that its really up in the air which one of them is ranked most relevant, and the score changes depending on how many documents are present in each shard (i.e. adding another document to one of the shards changes the score locally for that shard).

A possible solution is to use a distributed IDF - i.e. a scoring method that uses frequency across the whole collection instead of just for the local shard. This is done by configuring the stats cache to use ExactStatsCache, ExactSharedStatsCache or LRUStatsCache instead of the default LocalStatsCache. The LocalStatsCache is described as:

LocalStatsCache: This only uses local term and document statistics to compute relevance. In cases with uniform term distribution across shards, this works reasonably well.This option is the default if no <statsCache> is configured.

While the description for ExactStatsCache explains that it uses collection wide values:

ExactStatsCache: This implementation uses global values (across the collection) for document frequency.

The other two are different caching implementations of the ExactStatsCache.

You can change the statscache used in solrconfig.xml:

<statsCache class="org.apache.solr.search.stats.ExactStatsCache"/>