Sharding is important for :
- It allows you to horizontally split or scale your content volume.
- It allows you to distribute operations, for example, index tracking,
across shards (potentially on multiple nodes) therefore increasing
performance/throughput.
Replication : The purpose of replication is both to ensure high availability and to improve search query performance, although the main purpose is often to be more fault tolerant. This is accomplished by never storing a replica shard on the same node as its primary shard.
Advantages of Replication :
- Splits read and write load and operations
- Load distribution for search queries
- High availability for searching
- Any number of slave instances can be created to scale query performance
It is advised to set replication factor to at least 3 so that even if something happens to the rack, one copy is always safe.
Consider that you have 3 instance of solr server called server1, server2 and server3.
You have created 3 shards for your collection.
Each server has one shard on it as Shard1 on server1, shard2 on server 2 ans shard3 on server3.
Lets have 3 replicas of each shard on each server.
So your server1 will have shard1, replica of other shard like shard 2 and shard 3 as well.
Same goes with other servers.
If 2 servers goes down still you have one server with all the data of your collection.
That's the beauty of replication in achieving the high availability.