SolrCloud and updates that require index rebuild and/or modify code

Question

SolrCloud, thanks to ZooKeeper integration, has some nice utilities for managing and reloading core/collection configuration.

However, this only fully covers the case of trivial updates - but there are also nontrivial updates. Nontrivial in this case means resulting in some changes that make an updated node and/or its cores incompatible with some previous state.

In particular, these subcases come to mind:

A code update necessitating a restart of an underlying Solr instance.
A schema change that requires a full rebuild of a core.

My question is: how can SolrCloud and the associated Zookeeper services be used to make such updates easier, more reliable, and/or ensuring higher availability?

Note: I was hoping for some APIs/functionality that "understands" such updates. So far the most notable thing I've found is collection aliasing in CoreAdmin, which would allow for a smoother transition between the "old" and "new" versions - a little disappointing given the aforementioned hopes.

migueldiab migueldiab · Accepted Answer · 2013-12-19T19:07:05

I am not sure what you mean by

A code update necessitating a restart of an underlying Solr instance.
You mean that the Solr code changed? (in eg. a newer version) Or that the application accessing the Solr instance changed? (in eg. your codebase)

In the first scenario, just bringing up a new instance, and adding it to the ZooKeeper, even if the version differs, should be the end of it.

In the second case, it really doesn't matter what happens to the application accessing the data, right?

Then you mention what I believe is the most "common" scenario

A schema change that requires a full rebuild of a core.

If you are changing the schema, and this implies you are changing some of your indexes, your fields and/or meta-data, you can't really expect Solr to be agnostic of this change and keep running, and returning results, as their hashes no longer correspond to the same structures.

I think the best approach here would be to try to identify the depth of the changes, and either reload into a new index the updated structure, and then do the required code changes to your application, so it will query these new structures, or if a downtime window is allowed, just delete and rebuild the whole thing (this sort of attacks your "ensuring higher availability" requirement though)

I think it would be same as a hot update to a DB table in SQL and having two versions of the application using both, the old and new structures, it can be done with sort of an extreme care, and you will be better off by splitting them apart if you can...

Not sure if this helps, cheers,

Mike.

SolrCloud and updates that require index rebuild and/or modify code

1 Answers