1
votes

Our architecture is SOLRCloud 4.4 with 1 collection and several shards and replices.
Lately on some of the documents we received the following exception:

org.apache.solr.common.SolrException: No active slice servicing hash code 7b50d0a2 in DocCollection(collection1)={
"shards":{
"shard1":{
  "range":"80000000-d554ffff",
  "state":"active",
  "replicas":{
    "core_node1":{
      "state":"active",
      "core":"collection1",
      "node_name":"XX.XXX.XXX.131:8983_solr",
      "base_url":"http://XX.XXX.XXX.131:8983/solr",
      "leader":"true"},
    "core_node7":{
      "state":"active",
      "core":"collection1",
      "node_name":"XX.XXX.XXX.131:9983_solr",
      "base_url":"http://XX.XXX.XXX.131:9983/solr"}}},
"shard2":{
  "range":"d5550000-2aa9ffff",
  "state":"active",
  "replicas":{
    "core_node5":{
      "state":"active",
      "core":"collection1",
      "node_name":"XX.XXX.XXX.133:8983_solr",
      "base_url":"http://XX.XXX.XXX.133:8983/solr"},
    "core_node8":{
      "state":"active",
      "core":"collection1",
      "node_name":"XX.XXX.XXX.132:8983_solr",
      "base_url":"http://XX.XXX.XXX.132:8983/solr",
      "leader":"true"}}},
"shard3":{
  "range":null,
  "state":"active",
  "replicas":{
    "core_node6":{
      "state":"active",
      "core":"collection1",
      "node_name":"XX.XXX.XXX.133:9983_solr",
      "base_url":"http://XX.XXX.XXX.133:9983/solr"},
    "core_node9":{
      "state":"active",
      "core":"collection1",
      "node_name":"XX.XXX.XXX.132:9983_solr",
      "base_url":"http://XX.XXX.XXX.132:9983/solr",
      "leader":"true"}}}},

"router":"compositeId"}

From reading about Solr and Zookeeper, I understand that the zookeeper was trying to index a document on a shard that was in a fault state ? therefor it failed ? but when i look at the status via web-browser, all the shards are online with valid state.

1

1 Answers

1
votes

For all whom concern, After investigation and reading the manual how zookeeper/lucene/solr works.
When there are multiple shards in solr cloud, each of the shards has a range, when a document is indexed, It is added on specific shard with a key, The key is not the document Id that was added to solr. This key is hash code used by solr to decide on which shard to store the document and later on retrieve it.
Each shard has a range, when solr is adding a document, it generates a hash code id for the document and looking for the shard incharge for that range.
In my question, We can see that shard #3 range is null, meaning something bad has happened, this shard will not function well (or at all).
I converted to shard range number from hex to dec and found the following ranges:

shard 1: "range":"80000000-d554ffff", Decimal: 2147483648 - 3579117567
shard 2: "range":"d5550000-2aa9ffff", Decimal: 3579117568 - 715784191
shard 3:  ITS NULL, but should be: "range":"0-7FFFFFFF",       Decimal: 0 - (2147483648-1)
Paramteter from exception: 7B50D0A2 -> 2068893858 

We can see that shard #3 needs to be redefined with the proper range.
How to do it - update the clusterstate.json on zookeeper.
Is it safe to do ? I dont know.