1
votes

How can I do an SQL like group by statement on a '_search' query in elastic search?

I basically need to:

1 - Filter a bunch of items using multiple filters, queries etc. Done

2 - Put these results into buckets of unique category_id. 'category_id' is currently mapped as a 'float' field of the item document type. I also need to display one of the items matching the above filters from each bucket.

3 - Paginate through these buckets

Note: Item count: 1 Million, Unique category_id count: 60,000

I would like to get all of the data type 'items' grouped by a field called . In the results I would like to get a list of all unique 'category_id' and a single item in each category (first or any item, doesn't matter) inside this group. I'd like to be able to use "from" and "size" to paginate through these results.

For example if i had data to the effect of:

id:1, category_id: 1, color:'blue',
id:2, category_id: 1, color:'red',
id:3, category_id: 1, color:'red',
id:4, category_id: 2, color:'blue',
id:5, category_id: 2, color:'red',
id:6, category_id: 3, color:'blue',
id:7, category_id: 3, color:'blue',
id:8, category_id: 3, color:'blue',

For example i want to get all that have the color 'red' then grouped by category_id and get back data to the effect of:

category_id: 1
{
 item: { id:2, category_id: 1, color:'red'} 
},
category_id: 2
{
 item: { id:5, category_id: 2, color:'red'} 
}

This is what i have so far, but it doesn't get the correct top hit, and i dont think it allows multiple filters and queries or is paginatable.

GET swap/item/_search
{
  "size": 0,
  "aggs": {
    "color_filtered_items": {
      "filter": {
        "and": [
          {
            "terms": {
              "color": [
                "red"
              ]
            }
          }
        ]
      },
      "aggs": {
        "group_by_cat_id": {
          "terms": {
            "field": "category_id",
            "size": 10
          },
          "aggs": {
            "items": {
              "top_hits": {
                "_source": {
                  "include": [
                    "name",
                    "id",
                    "category_id",
                    "color"
                  ]
                },
                "size": 1
              }
            }
          }
        }
      }
    }
  }
}

Hacks, workaround, changes to data storage suggestions welcome. Any help greatly appreciated. Thank you all :)

1

1 Answers

0
votes

The following should work , assuming that you don't want number range based aggregation for category_id.

Also you cant do pagination on aggregated results , but then you can control the size per aggregation.

{
  "aggs": {
    "itemsAgg": {
      "terms": {
        "field": "items",
        "size": 10
      },
      "aggs": {
        "categoryAgg": {
          "terms": {
            "field": "category_id",
            "size": 10
          }
        }
      }
    }
  }
}