2
votes

Elasticsearch documentation states that The top_hits aggregation returns regular search hits, because of this many per hit features can be supported Crucially, the list includes Named filters and queries

But trying to add any filter or query throws SearchParseException: Unknown key for a START_OBJECT

Use case: I have items which have list of nested comments

items{id} -> comments {date, rating}

I want to get top rated comment for each item in the last week.

{
 "query": {
   "match_all": {}
  },
  "aggs": {
    "items": {
      "terms": {
        "field": "id",
        "size": 10
      },
      "aggs": {
        "comment": {
          "nested": {
            "path": "comments"
          },
          "aggs": {
            "top_comment": {
              "top_hits": {
                "size": 1,
                //need filter  here to select only comments of last week
                "sort": {
                  "comments.rating": {
                    "order": "desc"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

So is the documentation wrong, or is there any way to add a filter?

https://www.elastic.co/guide/en/elasticsearch/reference/2.1/search-aggregations-metrics-top-hits-aggregation.html

1
could you post your query and some sample documents and desired output? - ChintanShah25
@ChintanShah25 updated the question - Sumit Jain

1 Answers

-1
votes

Are you sure you have mapped them as Nested? I've just tried to execute such query on my data and it did work fine.

If so, you could simply add a filter aggregation, right after nested aggregation (hopefully I haven't messed up curly brackets):

POST data/_search
{
  "size": 0,
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "nested": {
          "path": "comments",
          "query": {
            "range": {
              "comments.date": {
                "gte": "now-1w",
                "lte": "now"
              }
            }
          }
        }
      }
    }
  },
  "aggs": {
    "items": {
      "terms": {
        "field": "id",
        "size": 10
      },
      "aggs": {
        "nested": {
          "nested": {
            "path": "comments"
          },
          "aggs": {
            "filterComments": {
              "filter": {
                "range": {
                  "comments.date": {
                    "gte": "now-1w",
                    "lte": "now"
                  }
                }
              },
              "aggs": {
                "topComments": {
                  "top_hits": {
                    "size": 1,
                    "sort": {
                      "comments.rating": "desc"
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

P.S. Always include FULL path for nested objects.

So this query will:

  1. Filter documents that have comments younger than one week to narrow down documents for aggregation and to find those, who actually have such comments (filtered query)
  2. Do terms aggregation based on id field
  3. Open nested sub documents (comments)
  4. Filter them by date
  5. Return the most badass one (most rated)