0
votes

We are currently working on a multi-language document CMS. Therefore we have documents which are translated into different languages.

For searching with Elasticsearch, we are currently using one index per language (german, english, french, ...) where all translations of the same document share the same ID.

When a user searches for specific terms we would like to search among all languages, but only return a list of distinct IDs. As far as I know this is only possible by using terms aggregations like the following:

curl localhost:9200/german,english,french/_search?pretty=1 -d 
'{
    "aggs": {
        "asset_ids": {
            "terms": {
                "field": "_id"
            }
        }
    }
}'

This works fine, but as the elasticsearch documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-order

states, this will return a list of distinct IDs ordered by the number of documents per bucket.

My question is: Is it possible to retrieve a list of distinct IDs from multiple indices where said ids are ordered by the relevance of the documents they represent? Or is there maybe a better approach for our scenario?

Thanks!

1

1 Answers

1
votes

In case anyone is interested in how we solved this problem, I will now give a possible solution. This is probably not the best solution to the problem.

Adding a top_hits aggregation to the terms aggregation includes the top scoring documents and their corresponding scores to the buckets:

curl localhost:9200/german,english,french/_search?pretty=1 -d 
'{
    "aggs": {
        "asset_ids": {
            "terms": {
                "field": "_id"
            },
            "aggregations": {
                "top_id_hits": {
                    "top_hits": {}
                }
            }
        }
    }
}'

Sorting the retrieved buckets by their best scoring document (aka max_score) finally does the trick.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html