1
votes

I have small 2-node cluster

  1. node1 is "always on" and placed on production server

  2. node2 is "sometimes on" and placed on notebook for developing proposes

they both have simple unicast config

discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ "other node IP" ]
  1. so node2 is very often offline from node1
  2. node1 is USUALLY master, but SOMETIMES node2 elected as master

Usually node1 has newer data than node2, after connecting all shards are syncronized from node1 to node2 and it's OK

But if node1 has newer data they simply lost. If occasionally node1 became master it can kill new data on production

I cannot set node.master : false on notebook because it became not-working offline from node1.

Is there way to setup master-master synchronization behavior with optimictic merging of documents in index (newer wins)?

May be i must have additional 3d node.data: false, node.master:true node in the middle?

So what i have to do?

1
In production - answer is to use suggestions from blog.trifork.com/2013/10/24/… but it's still not answer how to deal with "sometimes" connected node as developer note book. May be it's impossible and i must use only custom synchronization utility and not join them in cluster? - comdiv
why would you connect a development node to a production node ? - Julien C.
If you want to have a copy of the data it is much better to use the snapshot functionality provided by elasticsearch. - Jettro Coenradie
to juliendangers - ES contains not only user documents but some metadata and master data for application (ES used as non-sql DB for app). So some of data is prepared on notebook in developing process. When it's online - all ok. But in offline i lost data. But as i read in I-net and here - this is well known split-brain problem, so where's no magic - i have to do isolated developing cluster and sync it manually if required. - comdiv

1 Answers

1
votes

This is an interesting set-up you're trying to achieve but not one that I would recommend in the long run as you're putting your production node under stress very often.

First off, the term "development" in this case makes little sense because as far as ES is concerned, you're adding a "production" node and killing a "production" node every time. Most of what you do on your "development" node will affect your "production" node.

That said, here's what I would suggest you try:

  1. You can set the "development" node to not hold any data with node.data: false and prevent it from every becoming master with node.master: false. As such, when your development node joins, ES will not start moving shards around but you will still be able to query that node. In this configuration, you want to set your number of replicas to 0 on all indexes so that your cluster stays in "green" health. Note that in this configuration, all of the data is stored on the "production" node only. If that goes down, you will have data loss.

  2. If you really want the development node to contain a replica of the data, make sure you set node.master: false on it and that all your indexes have a replica count of 1. This way, your "production" node will always have a copy of your data and when your development node goes offline, no data will be lost. When your development node comes back online, it will automatically sync with the "production" node so that its data (its replicas) are up-to-date. Depending on the amount of data you have, this may take some time but is generally quick. Again, beware that whatever query you do on your "development" node affects your production node anyways so again... probably not a good idea in the long run. If you can afford it, it's much better to have at the very least 2 nodes with 1 replica on each index, ideally 3+.