0
votes

In a Udemy tutorial I came across this query here:

{ "query": { "bool": {
    "must": {"match": {"genre": "Sci-Fi"}},
    "must_not": {"match": {"title": "trek"}},
    "filter:" {"range": {"year": {"gte": 2010, "lt": 2015}}}
}}}

I was wondering if it's possible to optimize it? I am thinking of two possible ways:

  1. Putting "genre" in a filter context. But a movie might be of multiple genres, so I am not sure if working with type keyword and filter-term would work there.

  2. Putting "must_not" in a filter context directly (without a bool) will not work, because filters as far as I understand do not allow "filtering out", only "filtering what to keep". But if I wrapped must_not in a constant_score or filter-bool, would the query be more performant? Or does ES automatically take care of such optimizations? I just don't understand why must_not is in the query and not filter context in the first place. Can something only partially not match and thus reduce the score only by a degree?

1

1 Answers

1
votes

Regarding 1:

Moving the genre match to the filter context might speed it up a little bit (even though that depends on so many other factors), but you'll lose the ranking, which might or might not be important to you. In the end, use must when ranking is important or filter if it's not and your only goal is to match a document or not given some criteria.

Moreover, using type keyword will only get you "exact match" semantics, it might be what you want... or not, depending on how you're creating the queries (user input or controlled pick list)...

Regarding 2:

must_not is already in a filter context, so it doesn't get any simpler than what you already see. The filter context is made up of both filter + must_not.

One last thing I would add and I always add when someone asks about performance optimization: Premature optimization is the root of all evil so only do it when you are actually witnessing performance issues, never before.