I am running Solr cluster 7.4 with 2 nodes and 9 shards and 2 replicas for each shard.
When one of the servers crashes, I see this message (Skipping download for _3nap.fnm because it already exists
) in logs:
2019-04-16 09:20:21.333 INFO (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr
x:telegram_channel_post_archive_shard5_replica_n53
c:telegram_channel_post_archive s:shard5 r:core_node54)
[c:telegram_channel_post_archive s:shard5 r:core_node54
x:telegram_channel_post_archive_shard5_replica_n53]
o.a.s.h.IndexFetcher Skipping download for _3nap.fnm because it already exists
2019-04-16 09:20:35.265 INFO (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr x:telegram_channel_post_archive_shard5_replica_n53 c:telegram_channel_post_archive s:shard5 r:core_node54) [c:telegram_channel_post_archive s:shard5 r:core_node54 x:telegram_channel_post_archive_shard5_replica_n53] o.a.s.h.IndexFetcher Skipping download for _3nap.dim because it already exists
2019-04-16 09:20:51.437 INFO (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr x:telegram_channel_post_archive_shard5_replica_n53 c:telegram_channel_post_archive s:shard5 r:core_node54) [c:telegram_channel_post_archive s:shard5 r:core_node54 x:telegram_channel_post_archive_shard5_replica_n53] o.a.s.h.IndexFetcher Skipping download for _3nap.si because it already exists
2019-04-16 09:21:00.528 INFO (qtp1543148593-32) [c:telegram_channel_post_archive s:shard20 r:core_node41 x:telegram_channel_post_archive_shard20_replica_n38] o.a.s.u.p.LogUpdateProcessorFactory [telegram_channel_post_archive_shard20_replica_n38] webapp=/solr path=/update params={update.distrib=FROMLEADER&update.chain=dedupe&distrib.from=http://192.168.1.1:4239/solr/telegram_channel_post_archive_shard20_replica_n83/&min_rf=2&wt=javabin&version=2}{add=[9734588300_4723 (1630961769251864576), 9734588300_4693 (1630961769253961728), 9734588300_4670 (1630961769255010304), 9734588300_4656 (1630961769255010305)]} 0 80197
How is the recovery method in Solar?
Will they transfer all the documents from the shard or only the broken parts?
I found this note in the document:
If a leader goes down, it may have sent requests to some replicas and not others. So when a new potential leader is identified, it runs a synch process against the other replicas. If this is successful, everything should be consistent, the leader registers as active, and normal actions proceed. If a replica is too far out of sync, the system asks for a full replication/replay-based recovery.
but I don't understand this part and what does this mean?
If a replica is too far out of sync