0
votes

i need to index ~1 Billion Records.

querying the data from elasticsearch is by month range. (not only by single month)

what would be faster?

  1. save my documents on different indexes? lets say index per month, or
  2. save it all on one index, as one of the doc fields will be 'date', and filter by this field?
1

1 Answers

2
votes

If you are querying by month range, definitely split your indexes by month. With a billion documents, you'll probably want lots of shards across many nodes. Splitting by date gets you this. The alternative is having a single index with a large number of shards. With a billion documents, we are talking probably dozens or hundreds of shards depending on your document size and hardware.

However if you split by date, most of your shards can answer cheaply that 0 documents match your query (assuming you get your filter query right for this) and have a handful of shards that actually have all the data for the months convered take care of the query. So, it's like querying a smaller index that has all of the data you need for the query.