0
votes

I have a query with only a few 'shoulds' and 'filters', but one of the filters has a terms query with ~20,000 terms in it. Our max_terms_count is 200k but this is complaining about 'clauses'.

Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=too_many_clauses, reason=too_many_clauses: maxClauseCount is set to 1024]

I've written queries containing terms queries with far more terms than this. Why is this query causing a 'too many clauses' error? How can I rewrite this query to get the same result without the error?

{
    "query" : {
      "bool" : {
        "filter" : [
          {
            "nested" : {
              "query" : {
                "range" : {
                  "dateField" : {
                    "from" : "2019-12-03T21:34:30.653Z",
                    "to" : "2020-12-02T21:34:30.653Z",
                    "include_lower" : true,
                    "include_upper" : true,
                    "boost" : 1.0
                  }
                }
              },
              "path" : "observed_feeds",
              "ignore_unmapped" : false,
              "score_mode" : "none",
              "boost" : 1.0
            }
          }
        ],
        "should" : [
          {
            "bool" : {
              "filter" : [
                {
                  "terms" : {
                    "ipAddressField" : [
                      "123.123.123.123",
                      "124.124.124.124",
                      ... like 20,000 of these
                    ],
                    "boost" : 1.0
                  }
                }
              ],
              "adjust_pure_negative" : true,
              "boost" : 1.0
            }
          }
        ],
        "adjust_pure_negative" : true,
        "minimum_should_match" : "1",
        "boost" : 1.0
      }
    }
}

Edit: one note - The reason I'm wrapping the terms query in a should -> bool is because there are times where we need to have multiple terms queries OR'd together. This happened to not be one of them.

1

1 Answers

1
votes

The reason you are facing this with terms query is because the should clause is outside filter clause and contributing to score calculation. This is the reason these terms are subject to max_clause_count. If score is not required for that part then you can rephrase you query as below:

{
  "query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "query": {
              "range": {
                "dateField": {
                  "from": "2019-12-03T21:34:30.653Z",
                  "to": "2020-12-02T21:34:30.653Z",
                  "include_lower": true,
                  "include_upper": true,
                  "boost": 1
                }
              }
            },
            "path": "observed_feeds",
            "ignore_unmapped": false,
            "score_mode": "none",
            "boost": 1
          }
        },
        {
          "bool": {
            "should": [
              {
                "bool": {
                  "filter": [
                    {
                      "terms": {
                        "ipAddressField": [
                          "123.123.123.123",
                          "124.124.124.124",
                          ... like 20,000 of these
                        ],
                        "boost": 1
                      }
                    }
                  ],
                  "adjust_pure_negative": true,
                  "boost": 1
                }
              }
            ]
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  }
}