1
votes

We want to use Elastic search for our search use case.

We store issue data(think of it like jira) but more structured. Each ISSUE_TYPE contains some common fields like requestor, assigned to etc. and some specific fields for that particular problem.

I plan to create an elastic search index per ISSUE_TYPE. To enable cross ISSUE_TYPE search, I plan to do a cross index search something like . elasticsearch_endpoint/_search/*/ Our use case is read heavy. I am debating between using static mapping vs dynamic mapping(with dynamic templates). Using static mapping provides more control but is more restrictive. Dynamic mapping comes with a problem of mapping explosion.

I want to understand how ElasticSearch scales and what is tipping factor that determines its performance. How will it behave(read latency) for below use cases:

  • More data(A lot of same type of ISSUES but not many ISSUE_TYPES) with small size of individual document.
  • More fields in same index.( 20 vs 2000).
  • More indexes each with similar number of fields.
  • More indexes with some having 20 fields other having 2000 fields

Would really appreciate any pointers.

Thanks

1

1 Answers

1
votes

Regarding your queries:

More data(A lot of same type of ISSUES but not many ISSUE_TYPES) with small size of individual document.

This should be fine as long as "more data" doesn't refer to >50GB per shard. Further reference: https://discuss.elastic.co/t/too-big-a-shard-vs-too-many-shards/75889

More fields in same index.( 20 vs 2000).

As you mentioned, mapping explosion could be a problem if you are having too many fields. Please try to find more efficient mapping (do not introduce extra fields unnecessarily).

More indexes each with similar number of fields.

Again, depending on your definition of "more indexes". Having too few indexes with very large data is bad, but having too many indexes with very small data each is also not a good idea. If you have too many issue_type, you can consider having more than one issue type per index. Introduce new field for issue type value, then you can filter by issue type within that index.

More indexes with some having 20 fields other having 2000 fields

More or less covered in previous queries, without more context, nothing much can be said.

I want to understand how ElasticSearch scales and what is tipping factor that determines its performance.

About this, it's really depending on your data size, ES mapping and resources (RAM, number of cores, etc). The only way to find out the balance is by benchmarking your use case. For example: to find out how many indexes is "too many" you can keep on increasing number of indexes until you notice drop in search performance. You can either write some scripts to help the benchmarking, or you can explore https://github.com/elastic/rally