31
votes

We are planning to introduce Elastic search(AWS) for our Multi tenancy application. We have below options,

  1. Using One Index Per Tenant
  2. Using One Type Per Tenant
  3. All Tenants Share One Index with Custom routing

As per this blog https://www.elastic.co/blog/found-multi-tenancy the first option would give memory issue. But not clear about other options.

It seems if we are using the third option then there is no data segregation. Not sure about security.

I believe second option would be better option as data would be segregated.

Help me to identify best option to proceed elastic search with Multi tenancy.

Please note that we would leverage AWS infrastructure.

2
What is a tenant in your context? - Val
Each client is considered as a Tenant. - Selvakumar Ponnusamy
Then the answer depends on how many tenants/clients we are talking (1-10, 10-100, 100-1000, ?) and the growth factor you're expecting, i.e. is the number of client stable or do you expect a x% increase within the next N months? When deciding which strategy to take, you need to think of tomorrow, not today. - Val
There is a 4th option that you haven't mentioned: All tenants share one time-based index with custom routing. That's the most flexible option when your client count will increase over time - Val
hello @SelvakumarPonnusamy, I wanna know what approach you chose and we are also having questions, searching for past experience. I would appreciate if you can share your experience. Thanks. - Doston

2 Answers

30
votes

We are considering the same question right now, and the following set of articles by Elasticsearch was very helpful.

Start here: https://www.elastic.co/guide/en/elasticsearch/guide/current/scale.html

And read through each subsequent article until you hit this one: https://www.elastic.co/guide/en/elasticsearch/guide/current/finite-scale.html

The following two were very eye-opening for me:

https://www.elastic.co/guide/en/elasticsearch/guide/current/faking-it.html https://www.elastic.co/guide/en/elasticsearch/guide/current/one-big-user.html

The basic takeaway:

  • Alias per customer
  • Shard routing
  • Now you can have indexes for big customers, shared indexes for little customers, and they all appear to be separate indices
5
votes

This is a too important link not to be mentioned here: http://www.bigeng.io/elasticsearch-scaling-multitenant/

Good architecture dilemmas, and great performance analysis / reasoning.

tldr; they had index groups that are built around shard allocation filtering to segregate load across nodes in the cluster