2
votes

I have an erlang application that uses mnesia to store some basic state that defines users and roles of our system. We have a new feature that we need to roll out that requires an extension of the record schema stored in one our mnesia tables.

Our deployment plan was to take one node out of the cluster (just by removing from network), deploy the code, run a script to upgrade the record schema on that node. Bring it back into service. However, once I upgrade the records on this node, it replicates to the other nodes and certain operations begin failing on those nodes because of the mis-matched record schema. Obviously a BIG PROBLEM for zero-down-time deployments.

Is there a way to isolate my schema changes so that the schema upgrade can be run on each node as they are upgraded? Preferably for only the table being upgraded, allowing the other tables to keep replicating. However, I could live with shutting-of replication between all nodes for the few minutes it takes for use to deploy to all nodes.

1

1 Answers

0
votes

I had this exact problem. The only way I was able to solve it was to take all nodes out of the cluster and leave only one live, upgrade that "master" node's schema and code, which can hopefully be done while live, then for each remaining node, delete its database files, upgrade the code, and bring the node up (creating the tables with the correct new schema) and back into the cluster.

I used an escript I wrote that adds and removes nodes from a cluster to make this easier, and an Ansible playbook to orchestrate it. I really don't want to do that again any time soon.

The essential problem is that Mnesia doesn't have schema versioning, otherwise this could be done in a much better way.