0
votes

I want to implement facet filters on my products list page using elasticsearch. Basically my product document index contains a number of products which contains a number of variations.

The variations are defines as "nested" objects in order to make sure only products are returned where a variation matches all filter criteria. The filtering the variations seems to work correct. However the facets results are not as I would expect a facet filter to work.

For example my query below returns the terms "oriental" and "citrus" for the facet "f_attribute_scent". However I only want to get back the term that matched my filter which would be "citrus".

I have tried out a lot of different things with facet filters and everything but I just can't get it to work correctly.

My mapping looks like this:

curl -XPOST localhost:9200/products -d '
{
   "mappings": {
      "de": {
         "properties": {
            "variants": {
               "type": "nested",
               "include_in_parent": true
            }
         }
      }
   }
}
'

Here is my test data:

curl -XPUT localhost:9200/products/de/12 -d '
{
    "id": "12",
    "categories": [
        {
            "id": "12345",
            "sort": "1"
        },
        {
            "id": "23456",
            "sort": "2"
        },
        {
            "id": "34567",
            "sort": "3"
        }
    ],
    "variants": [
        {
            "id": "12.1.1",
            "brand": "guerlain",
            "collection": "emporio",
            "rating": 4,
            "color": "green",
            "price": 31,
            "scent": "fruity"
        },
        {
            "id": "12.1.2",
            "brand": "guerlain",
            "collection": "emporio",
            "rating": 2,
            "color": "blue",
            "price": 49.99,
            "scent": "flowery"

        }
    ]
}'

curl -XPUT localhost:9200/products/de/15 -d '
{
    "id": "15",
    "categories": [
        {
            "id": "12345",
            "sort": "1"
        },
        {
            "id": "23456",
            "sort": "2"
        },
        {
            "id": "34567",
            "sort": "3"
        }
    ],
    "variants": [
        {
            "id": "15.1.1",
            "brand": "dior",
            "collection": "foobar",
            "rating": 4,
            "color": "green",
            "price": 48.00,
            "scent": "oriental"
        },
        {
            "id": "15.1.2",
            "brand": "dior",
            "collection": "foobar",
            "rating": 2,
            "color": "red",
            "price": 52,
            "scent": "citrus"
        }
    ]
}'

This is the query:

curl -XGET localhost:9200/products/de/_search
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "variants",
               "filter": {
                  "bool": {
                     "must": [
                        {
                           "terms": {
                              "variants.color": [
                                 "green",
                                 "red"
                              ]
                           }
                        },
                        {
                           "term": {
                              "variants.scent": "citrus"
                           }
                        }
                     ]
                  }
               }
            }
         }
      }
   },
   "facets": {
      "f_attribute_color": {
         "terms": {
            "all_terms": true,
            "field": "variants.color"
         }
      },
      "f_attribute_scent": {
         "terms": {
            "field": "variants.scent"
         }
      }
   }
}

... And the result:

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "products",
            "_type": "de",
            "_id": "15",
            "_score": 1,
            "_source": {
               "id": "15",
               "categories": [
                  {
                     "id": "12345",
                     "sort": "1"
                  },
                  {
                     "id": "23456",
                     "sort": "2"
                  },
                  {
                     "id": "34567",
                     "sort": "3"
                  }
               ],
               "variants": [
                  {
                     "id": "15.1.1",
                     "brand": "dior",
                     "collection": "foobar",
                     "rating": 4,
                     "color": "green",
                     "price": 48,
                     "scent": "oriental"
                  },
                  {
                     "id": "15.1.2",
                     "brand": "dior",
                     "collection": "foobar",
                     "rating": 2,
                     "color": "red",
                     "price": 52,
                     "scent": "citrus"
                  }
               ]
            }
         }
      ]
   },
   "facets": {
      "f_attribute_color": {
         "_type": "terms",
         "missing": 0,
         "total": 2,
         "other": 0,
         "terms": [
            {
               "term": "red",
               "count": 1
            },
            {
               "term": "green",
               "count": 1
            }
         ]
      },
      "f_attribute_scent": {
         "_type": "terms",
         "missing": 0,
         "total": 2,
         "other": 0,
         "terms": [
            {
               "term": "oriental",
               "count": 1
            },
            {
               "term": "citrus",
               "count": 1
            }
         ]
      }
   }
}
3

3 Answers

1
votes

Based on your data examples above that is being indexed, you are seeing both citrus and oriental as terms facets results because your documents have variants as an array and both of those terms are valid for the document that matched your query.

From the Elasticsearch Facets Documentation:

There’s one important distinction to keep in mind. While search queries restrict both the returned documents and facet counts, search filters restrict only returned documents — but not facet counts.

If you need to restrict both the documents and facets, and you’re not willing or able to use a query, you may use a facet filter.

Based on the documentation and the desired results that you are asking for, you may want to look into using a Filter Facet instead.

1
votes

Your nested docs are being indexed in two ways:

  1. as independent documents, one for each element in the variants array, and
  2. in the top level de document as if you had set the variants field to be type object

The reason for (2) above is that you set include_in_parent to true. So actually, the top level doc looks like:

{
    "id": "12",
    "variants.id":    [ "12.1.1","12.1.2"],
    "variants.brand": [ "guerlain", "guerlain"],
    "variants.color": [ "green", "blue"]
    ... etc ...
}

Your query uses the nested filter correctly, which identifies the top level documents which match, but then you facet on the top-level doc, not the nested docs, which is why you are getting all of the results.

To fix it, all you need to do is to change your facets to use the nested docs instead, and to add the same nested filter that you used in your main query as a facet_filter:

"facets": {
  "f_attribute_color": {
     "terms": {
        "field": "variants.color"
     },
     "nested": "variants",
     "facet_filter": {
        "bool": {
           "must": [
              {
                 "terms": {
                    "variants.color": [
                       "green",
                       "red"
                    ]
                 }
              },
              {
                 "term": {
                    "variants.scent": "citrus"
                 }
              }
           ]
        }
     }
  },
  "f_attribute_scent": {
     "terms": {
        "field": "variants.scent"
     },
     "nested": "variants",
     "facet_filter": {
        "bool": {
           "must": [
              {
                 "terms": {
                    "variants.color": [
                       "green",
                       "red"
                    ]
                 }
              },
              {
                 "term": {
                    "variants.scent": "citrus"
                 }
              }
           ]
        }
     }
  }
}
0
votes

You are correct: If I use your filter facets I only get returned "citrus" for the "scent" facet.

However, if I want to filter by brand name "dior" I have got the same problem again. The facet result returns "dior" with a count of "2". The reason beeing that now both variations have the same brand name:

GET /products/de/_search
{
   "filter": {
      "nested": {
         "path": "variants",
         "filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "variants.brand": "dior"
                     }
                  }
               ]
            }
         }
      }
   },
   "facets": {
      "f_attribute_brand": {
         "nested": "variants",
         "facet_filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "variants.brand": "dior"
                     }
                  }
               ]
            }
         },
         "terms": {
            "field": "variants.brand"
         }
      }
   }
}

And the result:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "products",
            "_type": "de",
            "_id": "15",
            "_score": 1
         }
      ]
   },
   "facets": {
      "f_attribute_brand": {
         "_type": "terms",
         "missing": 0,
         "total": 2,
         "other": 0,
         "terms": [
            {
               "term": "dior",
               "count": 2
            }
         ]
      }
   }
}