We are currently working on a multi-language document CMS. Therefore we have documents which are translated into different languages.
For searching with Elasticsearch, we are currently using one index per language (german, english, french, ...) where all translations of the same document share the same ID.
When a user searches for specific terms we would like to search among all languages, but only return a list of distinct IDs. As far as I know this is only possible by using terms aggregations like the following:
curl localhost:9200/german,english,french/_search?pretty=1 -d
'{
"aggs": {
"asset_ids": {
"terms": {
"field": "_id"
}
}
}
}'
This works fine, but as the elasticsearch documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-order
states, this will return a list of distinct IDs ordered by the number of documents per bucket.
My question is: Is it possible to retrieve a list of distinct IDs from multiple indices where said ids are ordered by the relevance of the documents they represent? Or is there maybe a better approach for our scenario?
Thanks!