1
votes

I am running Solr cluster 7.4 with 2 nodes and 9 shards and 2 replicas for each shard.

When one of the servers crashes, I see this message (Skipping download for _3nap.fnm because it already exists) in logs:

2019-04-16 09:20:21.333 INFO  (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr 
x:telegram_channel_post_archive_shard5_replica_n53 
c:telegram_channel_post_archive s:shard5 r:core_node54) 
[c:telegram_channel_post_archive s:shard5 r:core_node54 
x:telegram_channel_post_archive_shard5_replica_n53] 
o.a.s.h.IndexFetcher Skipping download for _3nap.fnm because it already exists
2019-04-16 09:20:35.265 INFO  (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr x:telegram_channel_post_archive_shard5_replica_n53 c:telegram_channel_post_archive s:shard5 r:core_node54) [c:telegram_channel_post_archive s:shard5 r:core_node54 x:telegram_channel_post_archive_shard5_replica_n53] o.a.s.h.IndexFetcher Skipping download for _3nap.dim because it already exists
2019-04-16 09:20:51.437 INFO  (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr x:telegram_channel_post_archive_shard5_replica_n53 c:telegram_channel_post_archive s:shard5 r:core_node54) [c:telegram_channel_post_archive s:shard5 r:core_node54 x:telegram_channel_post_archive_shard5_replica_n53] o.a.s.h.IndexFetcher Skipping download for _3nap.si because it already exists
2019-04-16 09:21:00.528 INFO  (qtp1543148593-32) [c:telegram_channel_post_archive s:shard20 r:core_node41 x:telegram_channel_post_archive_shard20_replica_n38] o.a.s.u.p.LogUpdateProcessorFactory [telegram_channel_post_archive_shard20_replica_n38]  webapp=/solr path=/update params={update.distrib=FROMLEADER&update.chain=dedupe&distrib.from=http://192.168.1.1:4239/solr/telegram_channel_post_archive_shard20_replica_n83/&min_rf=2&wt=javabin&version=2}{add=[9734588300_4723 (1630961769251864576), 9734588300_4693 (1630961769253961728), 9734588300_4670 (1630961769255010304), 9734588300_4656 (1630961769255010305)]} 0 80197

How is the recovery method in Solar?

Will they transfer all the documents from the shard or only the broken parts?

I found this note in the document:

If a leader goes down, it may have sent requests to some replicas and not others. So when a new potential leader is identified, it runs a synch process against the other replicas. If this is successful, everything should be consistent, the leader registers as active, and normal actions proceed. If a replica is too far out of sync, the system asks for a full replication/replay-based recovery.

but I don't understand this part and what does this mean? If a replica is too far out of sync

1

1 Answers

1
votes

The note just says that it'll attempt to sync as little as possible, but if that's not possible - i.e. the sync is so far behind that the transaction log isn't usable any longer, the complete set of files in the index will be replicated to the index. This takes longer than regular replication.

The message you're getting is that the file in question has already been replicated, so it doesn't have to be sent to the replica again.