2
votes

Elasticsearch Parent/Child nested relationship impose having the parent and children on the same shard by using the _routing field during ingesting.

I was wondering if using the same process would provide performance improvement while using the collapse feature of elasticsearch or would it make it worst?

If we look at both cases:

1) Routing to the same shard: the shard is able to do the collapsing on its own and return already fully collapsed documents

2) Document are on many shards: the collapse can only happen later with all shards returning lots of documents that will be collapsed later.

I do not know if elasticsearch will do the 2nd even though documents where on the same shard.

Thanks.

1

1 Answers

0
votes

The full genesis of field collapsing (introduced in ES 5.3) can be found in PR 22337 (issue 21833).

Initially, the idea was to create a new top_groups aggregation, modeled after a terms+top_hits combo, but in the end it was deemed to costly to implement and not necessarily optimal.

Field collapsing has finally been implemented in the search layer, because it can benefit from the existing query/fetch phases and requires a lot less memory that doing it as an aggregation. Also pagination would work out of the box as well.

It was discussed whether it would be a good idea to use the grouping field as a routing key to make sure all top hits were located on the same shard, but in the end this was deemed too big a limitation.

So, long story short, with field collapsing there is no such restriction to locate all documents on the same shard because the fetch request (phase 2) will be sent to all shards anyway.

As always, the best way is to try it out for yourself and measure the performance.

  • 1 index with 1 shard (with and without routing key)
  • 1 index with several shard (with and without routing key)

My take is that it would make no big difference, because only the top hits are collapsed and a normal search query (without field collapsing) would go through both query/fetch phases as well anyway.