I have a basic elasticsearch cluster at the moment in which I am using a river to index data. I want to scale for future growth in two phases. Number of documents indexed per second is what could be the bottleneck.
- Phase 1: Indexing 100 documents per second into elasticsearch
- Phase 2: Indexing 10000 documents per second into elasticsearch
How should I go about it?
Thanks-in-advance!
Edit:
I am trying to index the Twitter stream.
Each document = around 2 KB.
Hardware is flexible. Right now I have magnetic disks (with 50 GB RAM) but getting SSD (and better config) is no biggie.