Not sure how to formulate the question. I'm using Elasticsearch 2.2.
Let's start with an example of the dataset, made of 5 documents:
[
{
"header": {
"called_entity": { "uuid": "a" },
"coverage_entity": {},
"sucessful_transfers": 1
}
},
{
"header": {
"called_entity": { "uuid": "a" },
"coverage_entity": { "uuid": "b" },
"sucessful_transfers": 1
}
},
{
"header": {
"called_entity": { "uuid": "b" },
"coverage_entity": { "uuid": "a" },
"sucessful_transfers": 1
}
},
{
"header": {
"called_entity": { "uuid": "b" },
"coverage_entity": { "uuid": "a" },
"sucessful_transfers": 0
}
}
]
called_entity always has a uuid. coverage_entity can be empty, or have an uuid.
What I want is to aggregate on either called_entity.uuid or coverage_entity.uuid, and then count the total amount of documents and the sum of successful_transfers. So, for these 5 documents, I would have something like that as a result:
uuid,doc_count,successful_transfers_count
"a",4,3
"b",3,2
The problem is that it means a same document can be used on several aggregations, as long as the aggregation key is either in called_entity.uuid or coverage_entity.uuuid (I'm not even sure if that's possible, which is why I'm posting here).
What I'm currently doing is simply aggregating on the called_entity.uuid field, but of course that's not enough:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"dim_1": {
"terms": {
"field": "header.called_entity.uuid",
"size": 0
},
"aggs": {
"successful_transfers": {
"sum": {
"field": "header.successful_transfers"
}
}
}
}
}
}
Which gives me something like:
uuid,doc_count,successful_transfers_count
"a",2,2
"b",2,1
...Which is not what I want. So, how can I aggregate on several values, or for a given aggregation, compute data based on values present in all the documents (not only the one in the aggregation)?
Thank you.