One replicated mnesia table has become out-of-sync

Question

I have an erlang application currently running on four nodes with a replicated mnesia db that stores minimal data regarding connected clients. The mnesia replication has been working seamlessly in the past (as far as I know anyway) but a client recently noticed that one of the nodes is missing some ids related to his application.

I'm not really sure how this happened. Our network may have had a hiccup at the time. Maybe? But, of more urgency at the moment is getting the data into a good state across all nodes. Is there a way to tell mnesia to replicate from a known-good node?

loxs loxs · Accepted Answer · 2014-01-01T08:25:27

Mnesia is legendary about this issue. It's a huge PITA.

Looking at it from CAP theorem's point of view, most systems built with Mnesia end up being C-A (consistency-availability with no partition tolerance) systems. For most of the time you have (and heavily rely on) its hard consistency. Then a network partition happens... It's still available for writes, but these writes destroy consistency. And later on, Mnesia has no mechanism for automatic data repair.

Everyone who uses Mnesia in a cluster should familiarize themselves with these tradeoffs. Your problem is a clear sign that using Mnesia was a poor choice. Double so if this data is critical to you.

I too use Mnesia in such a way (sometimes we all need speed you know). But I make sure to only use it to store data that I can easily reconstruct. In general, if you need it stored on disk, Mnesia is no good, except for toy projects.

I make sure to always have this function at hand:

reinit_mnesia_cluster() ->
    rpc:multicall(mnesia, stop, []),
    AllNodes = [node() | nodes()],
    mnesia:delete_schema(AllNodes),
    mnesia:create_schema(AllNodes),
    rpc:multicall(mnesia, start, []).

Use it only after the network partition has been resolved and all nodes are reachable. This will erase all Mnesia replicas and start it anew. Again, if you can't live with what it does, then using Mnesia was a poor choice.

For important data that needs hard consistency, use SQL. For important data that needs availability, use Riak. For shared state that needs speed, use Redis. Mnesia is no replacement for these systems, although at first it does seem so.

Edit on 2014-11-16: Here is a much better article on the topic, explaining in detail what I said above https://medium.com/@jlouis666/mnesia-and-cap-d2673a92850

One replicated mnesia table has become out-of-sync

2 Answers