0
votes

I am new to elastic search and would like to know if the following can be done with normal term, range queries and sorting or whether I need to use elastic search aggregation querying

I have a set of documents with several string fields and one date field. I would like to select the top N of those documents that match exactly (term queries) two of the string fields and whose date is within some date range and get the counts of those top N docs

A typical doc will be

{ "_id" : ObjectId("55b0a8b448f3bdb6bf26683c"),
        "type" : "type1",
        "time-gmt" : ISODate("2015-07-23T08:41:29.299Z"),
        "sID" : "id1"}

I would like to find, say the top 10 docs of type="type 1" in a certain "time-gmt" range as well as how many of those 10 top docs there are. So a table of the result set would

sID1    120
sID2    100
sID3    90
...
sID10   3
1
Can you clarify with some data samples and results you expect to see? I think there are some contradictions in your description - at least for me / some points I am not understanding. Specifically, can you clarify "the counts of those top N docs" and why you are grouping and ordering by fields that will have the same value for all the results? - eemp
I edited my question. - JennyToy

1 Answers

1
votes

If I understood your question correctly, part of what you're looking for (counts of documents of a particular type) should be achievable using bucket aggregations.

Bucket aggregations allow you to group documents into, well, one or more buckets. More on that here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html

and here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html

In your particular case, I think you could query your Elasticsearch cluster by fashioning a query similar to the following:

curl -XPOST 'http://<your-elasticsearch-cluster-endpoint>/_all/_search?pretty' -d '{
    "size": 0,
    "aggregations": {
        "type_agg": {
            "filter": {
                "time-gmt": {
                    "from":<tick>,
                    "to":<tick>
                }
            },
            "terms": {
                "field": "type"
            }
        }
    }
}'

I believe this query would get you counts of documents according to values found in the type field.

Hope this was helpful enough.