15
votes

I have couple of Elasticsearch questions regarding client node:

  1. Can I say: any nodes as long as they are opening HTTP port, I can treat them as "client" nodes, because we can do search/index through this node.

  2. Actually we treat the node as client node when the cluster=false and data=false, if I set up 10 client nodes, do I need to route in my client side, I mean if I specify clientOne:9200 in my code as ES portal, then would clientOne forward other HTTP requests to other client nodes, otherwise, clientOne would be under very high pressure. i.e do they communicate with each other between client nodes?

  3. When I specify client nodes in ES cluster, should I close other nodes' HTTP port? Because we can only query client nodes.

  4. Do you think it's necessary to set up both data node and client node in the same machine, or just setup data node acts as client node as well, anyways it's in the same machine?

  5. If the ES cluster would be heavily/frequently indexed while less searched, then I don't have to set up client node, because client node good for gathering data, right please?

  6. For general search/index purpose should I use http port or tcp port, what's the difference in clients perspective please?

1
It's not too difficult, there are just too many questions crammed into a single one, which lowers its overall quality, because it will make it difficult for people looking for a specific subject to find the right answer to their problem. One question should just be one question, six questions, should be six different questions.Val
Breaking this up would mean that if someone knows the answer, they will post it. As it is now, people might know the answer to one or two of the questions, but not the rest, so they skip it. Also, your phrasing is a bit tough to parse, and describing the background a little more could help.fabianvf

1 Answers

33
votes
  1. Yes, you can send queries via http to any node that has port 9200 open.

  2. With node.data: false and node.master: false, you get a "client node". These are useful for offloading indexing and search traffic from your data nodes. If you have 10 of them, you would want to put a load balancer in front of them.

  3. Closing the data node's http port (http.enabled: false) would keep them from serving client requests (probably good), though it would also prevent you from curl'ing them directly for stats, etc.

  4. Client nodes are useful (see #2), so I wouldn't route traffic directly to your data nodes. Whether you run both a client and data node on the same piece of hardware would be dependent on the config of that machine (do you have sufficient RAM, etc).

  5. Client node are also useful for indexing, because they know which data node should receive the data for storage. If you sent an indexing request to a random data node instead, the odds would be high that it would have to redirect that request to another node. That's a waste of time and resources, if you can create client nodes.

  6. Having your clients join the cluster might give them access to more information about the cluster, but using http gives them a more generic "black box" interface. With http, you also don't have to keep your clients at the same version as your ES nodes.

Hope that helps.