1
votes

My data

{ "city":"New York", "street":"Atlantic Avenue" } { "city":"New York", "street":"Hudson Street" } { "city":"New York", "street":"Fawn Court" } { "city":"Boston", "street":"Atlantic Avenue" } { "city":"Boston", "street":"Hudson Street" } { "city":"Boston", "street":"7th Avenue" } { "city":"Washington DC", "street":"Atlantic Avenue" } { "city":"Washington DC", "street":"Dogwood Drive" } { "city":"Washington DC", "street":"7th Avenue" }

If we count each time that a street appears, we will get the following:

| Street name | Number of times | |-----------------|-----------------| | Atlantic Avenue | 3 | | Hudson Street | 2 | | 7th Avenue | 2 | | Hudson Street | 1 | | Dogwood Drive | 1 |

My goal

I want to build a histogram that says how many street names are unique, how many are seen twice, and so on...

To do so, I should get the result of a Terms Aggregation and send it to a Histogram Aggregation.

Here is the result for the example above:

| Street name count | Number of times | |-------------------|-----------------| | 1 | 2 | | 2 | 2 | | 3 | 1 |

What I have already done

I have build the first table with Kibana and this query:

curl -XGET 'http://localhost:9200/index1/type1/_search?pretty' -d ' { "size": 0, "aggs": { "group_by_street": { "terms": { "field": "street" } } } }'

Then I tried to add this results to a histogram:

curl -XGET 'http://localhost:9200/fingerprint/user/_search?pretty' -d ' { "size": 0, "aggs": { "histogram_streets": { "histogram": { "field": "group_by_street>_count", "interval": 1 }, "aggs": { "group_by_street": { "terms": { "field": "street" } } } } } }'

However, I retrieve an empty bucket.

Any idea of how can I do it?

Thanks!

1
I'm pretty sure this field name is wrong: group_by_street>_count. What are you trying to do here?Phil
@Phil I want to build the histogram based on the "doc_count" field of the "group_by_street" aggregation.marc_aragones

1 Answers

0
votes

I want to build the histogram based on the "doc_count" field of the "group_by_street" aggregation

ES aggregations are document based, meaning you can use a bucket aggregation to group by a field on documents and then further aggregate inside each bucket as a nested aggregation on another document field. You can use scripts to combine document fields in novel ways, but I've never seen a capability to aggregate on the number of buckets or the number of documents in a bucket.

The only place I'm aware of a path notation like group_by_street>_count working is for specifying the sort path in a nested aggregation as documented here.

I would suggest you create the histogram client side from the terms aggregation result. In a way, the terms aggregation is already a history as it tells you directly the number of documents that have each term. You just need to order each bucket tuple (term, doc_count) in a way meaningful for you. I'm not sure you can do this directly in kibana or not.