7
votes

I've recently stood up a test ELK stack Ubuntu box to test the functionality and have been very happy with it. My use case for production would involve ingesting at least 100GB of logs per day. I want to be as scalable as possible, as this 100GB/day can quickly rise as we had more log sources.

I read some articles on ELK production, including the fantasic Logz.io ELK Deployment. While I have a general idea of what I need to do, I am unsure on some core concepts, how many machines I need for such a large amount of data and whether I need a broker like Redis included in my architecture.

What is the point of a broker like Redis? In my test instance, I have multiple log sources sending logs over TCP,syslog, and logstash forwarder to my Logstash directly on my ELK server (which also has Elasticsearch, Nginx, and Kibana installed configured with SSL).

In order to retain a high availability, state of the art production cluster, what machines+specs do I need for at least 100GB of data per day, likely scaling toward 150GB or more in the future? I am planning using my own servers. From what I've researched, the starting point should like something like (assuming I include Redis):

  • 2/3 servers with a Redis+Logstash(indexer) instance for each server. For specs, I am thinking 32GB RAM, fast I/O disk 500GB maybe SSD, 8 cores (i7)
  • 3 servers for Elasticsearch (this is the one I am most unsure about) -- I know I need at least 3 master nodes and 2 data nodes, so 2 servers will have 1 master/1 data each -- these will be beefy 64GB RAM, 20TB, 8 cores. The other remaining master node can be on a low spec machine, as it is not handling data.
  • 2 servers for Nginx/Kibana -- these should be low spec machines, as they are just the web server and UI. Is a load balancer necessary here?

EDIT: Planning on keeping the logs for 60 days.

1
How long are you going to keep the logs? See stackoverflow.com/questions/30331768/… for some numbers.Magnus Bäck

1 Answers

11
votes

As for Redis, it acts as a buffer in case logstash and/or elasticsearch are down or slow. If you're using the full logstash or logstash-forwarder as a shipper, it will detect when logstash is unavailable and stop sending logs (remembering where it left off, at least for a while).

So, in a pure logstash/logstash-forwarder environment, I see little reason to use a broker like redis.

When it becomes important is for sources that don't care about logstash's status and don't buffer in their side. syslog, snmptrap, and others fall into this category. Since your sources include syslog, I would bring up brokers in your setup.

Redis is a RAM-intensive app, and that amount of memory that you have will dictate how long of a logstash outage you can withstand. On a 32GB server (shared with logstash), how much of the memory would you give yo redis? How large is your average document size? How many documents would it take to fill the memory? How long does it take to generate that many documents? In my experience, redis fails horribly when the memory fills, but that could just have been me.

Logstash is a CPU-intensive process as all the filters get executed.

As for the size of the elasticsearch cluster, @magnus already pointed you to some information that might help. Starting with 64GB machines is great, and then scale horizontally as needed.

You should have two client (non-data) nodes that are used as the access point for inserts (efficiently dispatching the requests to the correct data node) and searches (handling the 'reduce' phase with data returned from the data nodes). Two of these in a failover config would be a good start.

Two kibana machines will give you redundancy. Putting them in a failover config is also good. nginx was more used with kibana3, I believe. I don't know if people are using it with kibana4 or have moved to 'shield'.

Hope that helps.