0
votes

Running version 5.4 of Elasticsearch.

With this mapping:

PUT pizzas
{
  "mappings": {
    "pizza": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "types": {
          "type": "nested",
          "properties": {
            "topping": {
              "type": "keyword"
            },
            "base": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

And this data:

PUT pizzas/pizza/1
{
  "name": "meat",
  "types": [
    {
      "topping": "bacon",
      "base": "normal"
    },
    {
      "topping": "bacon",
      "base": "sour dough"
    },
    {
      "topping": "pepperoni",
      "base": "sour dough"
    }
  ]
}

If I run this query:

GET pizzas/_search
{
  "query": {
    "nested": {
      "path": "types",
      "query": {
        "bool": {
          "filter": {
            "term": {
              "types.topping": "bacon"
            }
          }
        }
      }
    }
  }
}

I get:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": [
      {
        "_index": "pizzas",
        "_type": "pizza",
        "_id": "1",
        "_score": 0,
        "_source": {
          "name": "meat",
          "types": [
            {
              "topping": "bacon",
              "base": "normal"
            },
            {
              "topping": "bacon",
              "base": "sour dough"
            },
            {
              "topping": "pepperoni",
              "base": "sour dough"
            }
          ]
        }
      }
    ]
  }
}

But what I really want for my hits are:

"hits": [
  {
    "_index": "pizzas",
    "_type": "pizza",
    "_id": "1",
    "_score": 0,
    "_source": {
      "name": "meat",
      "types": [
        {
          "topping": "bacon",
          "base": "normal"
        }
      ]
    }
  },
  {
    "_index": "pizzas",
    "_type": "pizza",
    "_id": "1",
    "_score": 0,
    "_source": {
      "name": "meat",
      "types": [
        {
          "topping": "bacon",
          "base": "sour dough"
        }
      ]
    }
  }
]

I want to do this so if a user searches for "bacon", they'll get a list of pizza options which they can go with which include that topping.

Is this even supported by Elasticsearch? I can separate out my results programmatically but I'm hoping it's built in.

Thanks for your time.

2

2 Answers

1
votes

You can just use "inner_hits" to get the specifically matched hits in a nested search:

Query:

GET pizzas/_search
{
  "query": {
    "nested": {
      "path": "types",
      "query": {
        "bool": {
          "filter": {
            "term": {
              "types.topping": "bacon"
            }
          }
        }
      },
      "inner_hits": {
          "size": 10
      }
    }
  }
}

Note that the "inner_hits" will return 3 results unless specifically told to return a different amount. You can see the options here.

There doesn't seem to be an option to not set a size, you just have to set it to be higher than the max amount of inner_hits you will ever have.

Result:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": [
      {
        "_index": "pizzas",
        "_type": "pizza",
        "_id": "1",
        "_score": 0,
        "_source": {
          "name": "meat",
          "types": [
            {
              "topping": "bacon",
              "base": "normal"
            },
            {
              "topping": "bacon",
              "base": "sour dough"
            },
            {
              "topping": "pepperoni",
              "base": "sour dough"
            }
          ]
        },
        "inner_hits": {
          "types": {
            "hits": {
              "total": 2,
              "max_score": 0,
              "hits": [
                {
                  "_nested": {
                    "field": "types",
                    "offset": 1
                  },
                  "_score": 0,
                  "_source": {
                    "topping": "bacon",
                    "base": "sour dough"
                  }
                },
                {
                  "_nested": {
                    "field": "types",
                    "offset": 0
                  },
                  "_score": 0,
                  "_source": {
                    "topping": "bacon",
                    "base": "normal"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

With your code, you can then join together the hits and inner_hits so the only types which are returned are relevant.

0
votes

One possible way of fixing this issue may be to use _parent and _child relationships and splitting out the pizzas from their types:

PUT pizzas
{
  "mappings": {
    "pizza": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "rating": {
          "type": "integer"
        }
      }
    },
    "type": {
      "_parent": {
        "type": "pizza" 
      },
      "properties": {
        "types": {
          "properties": {
            "topping": {
              "type": "keyword"
            },
            "base": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

PUT pizzas/pizza/1
{
  "name": "meat",
  "rating": 5
}

PUT pizzas/type/1?parent=1
{
  "topping": "bacon",
  "base": "normal"
}

PUT pizzas/type/2?parent=1
{
  "topping": "bacon",
  "base": "sour dough"
}

PUT pizzas/type/3?parent=1
{
  "topping": "pepperoni",
  "base": "sour dough"
}

You can then search for just the child but also see what parent it relates to.

Query:

GET pizzas/type/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "topping": "bacon"
        }
      }
    }
  }
}

Result:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": [
      {
        "_index": "pizzas",
        "_type": "type",
        "_id": "1",
        "_score": 0,
        "_routing": "1",
        "_parent": "1",
        "_source": {
          "topping": "bacon",
          "base": "normal"
        }
      },
      {
        "_index": "pizzas",
        "_type": "type",
        "_id": "2",
        "_score": 0,
        "_routing": "1",
        "_parent": "1",
        "_source": {
          "topping": "bacon",
          "base": "sour dough"
        }
      }
    ]
  }
}

In your code you can then marry up the data to create the original data structure which was needed.

Caveats

There are a couple of issues with changing the structure like this:

One: Ordinary sorting can't be setup with children if you need to sort the parent by the child (source).

Two: If there's other fields that you also need to filter by, you will end up needing to run a query such as:

GET pizzas/pizza/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "rating": 5
        }
      },
      "must": {
        "has_child": {
          "type": "type",
          "query": {
            "bool": {
              "filter": {
                "term": {
                  "topping": "bacon"
                }
              }
            }
          }
        }
      }
    }
  }
}

Followed by another query for those specific children which then need to be reattached to the parent.