5
votes

We're using solr 3.6 replication with 2 servers - a master and a slave - and we're currently looking for the way to do clean backups.

As the wiki says so, we can use a HTTP command to create a snapshot of the master like this: http://myMasterHost/solr/replication?command=backup

But we still have some questions:

  • What is the benefit of the backup command on a classic shell script copying the index files?

  • The command only backups the indexes; is it possible to copy also the spellchecker folder? is it needed?

  • Can we create the snapshot while the application is running, so while there are potential index updates?

  • When we have to restore the servers from the backup, what do we have to do on the slave?
    • just copy the snapshot in its index folder, and removing the replication.properties file (or not)?
    • ask for a fetchindex through the HTTP command http://mySlave/solr/replication?command=fetchindex ?
    • just empty the slave index folder, in order to force a full replication from the master?
1

1 Answers

3
votes

You can use the backup command provided by the ReplicationHandler. It's an asynchronous operation and it takes time if your index is big. This way you don't need to shutdown Solr. Then you'll find within the index directory a new directory named backup.yyyymmddHHMMSS with the backup date. You can also configure how many old backups you want to keep.

After that of course it's better if you move the backup to a safe location, probably to a different server.

I don't think it's possible to backup the spellchecker, not completely sure though.

Of course the command is meant to be run while the application is running. The only problem is that you will probably lose in the backup the documents that you committed after you started the backup itself.

You can also have a look at the lucene CheckIndex tool. Once you backed up the index you could check if the index is ok.

I wouldn't personally use the backups to restore the index on the slaves if you already have a good index on the master. The copy of the index would be automatic using the standard replication process (it's really a copy of the index segments), you don't need to copy them manually unless the backup contains better data than the master.