0
votes

I have a basic elasticsearch cluster at the moment in which I am using a river to index data. I want to scale for future growth in two phases. Number of documents indexed per second is what could be the bottleneck.

  1. Phase 1: Indexing 100 documents per second into elasticsearch
  2. Phase 2: Indexing 10000 documents per second into elasticsearch

How should I go about it?

Thanks-in-advance!

Edit:
I am trying to index the Twitter stream. Each document = around 2 KB. Hardware is flexible. Right now I have magnetic disks (with 50 GB RAM) but getting SSD (and better config) is no biggie.

1
I'll use the bulk api for that purpose but if you don't give us more information about your data size or specificity of your hardware and what you are trying to achieve, we will not be able to help you!eliasah
@eliasah thanks. I've edited my answer with the details...huhahihi
are you using Logstash, a river or another solution?eliasah
yes, I am using the elasticsearch twitter river at the moment. But if it cant keep up in the future, I am fine with writing my own code to stream and index the tweets...huhahihi
First, rivers are deprecated and they will be removed in further versions. Second, Logstash is more flexible than rivers. Ex: you might want to perform an extra preprocessing on the input. The rivers don't allow that unlike Logstash.eliasah

1 Answers

1
votes

A few highlights that come from experiments and articles:

Have fun !