1
votes

I'm trying to write an ElasticSearch query that allows for filtering the results set. The application provides a filter for job titles and also an exclusion filter for the very same job titles. So for example, in the data set bellow, I want to filter for Engineer, but also exclude Software Engineer. The problem is that now the query also excludes Principal Software Engineer and it shoudn't.

Here's the data I'm using:

{
  "data": [
    {
      "email": "[email protected]",
      "job_title": "Industrial Electrical Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Chief Revenue Officer"
    },
    {
      "email": "[email protected]",
      "job_title": "Principal Software Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Software Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Design Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Software Designer"
    },
    {
      "email": "[email protected]",
      "job_title": "Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Mechanical Design Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Electrical Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Chief Executive Officer"
    }
  ]
}

And here is the ElasticSearch query:

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "job_title": {
                    "query": "Engineer",
                    "operator": "and"
                  }
                }
              }
            ]
          }
        }
      ],
      "filter": [
        {
          "term": {
            "user_id": 1
          }
        },
        {
          "bool": {
            "must_not": [
              {
                "match": {
                  "job_title": "Software Engineer"
                }
              }
            ]
          }
        }
      ]
    }
  }
}
2

2 Answers

2
votes

Assuming that job_title is of text type. Elasticsearch uses a standard analyzer for the text type field if no analyzer is specified. This will break "Software Engineer" into

{
  "tokens": [
    {
      "token": "software",
      "start_offset": 0,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "engineer",
      "start_offset": 9,
      "end_offset": 17,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

So when querying with must_not and match query for "Software Engineer", it will not include any of the results that include either software or engineer


If you have not explicitly defined any mapping then you need to add .keyword to the job_title field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after job_title field).

Modify your query as

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "job_title": {
                    "query": "Engineer",
                    "operator": "and"
                  }
                }
              }
            ]
          }
        }
      ],
      "filter": [
        {
          "bool": {
            "must_not": [
              {
                "term": {
                  "job_title.keyword": "Software Engineer"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Update 1:

If you are using elasticsearch version 7.10 or above, and you want to make the search case insensitive as well as search for the exact term, then you can use the case_insensitive param with the term query.

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "job_title": {
                    "query": "Engineer",
                    "operator": "and"
                  }
                }
              }
            ]
          }
        }
      ],
      "filter": [
        {
          "bool": {
            "must_not": [
              {
                "term": {
                  "job_title.keyword": {
                    "value": "software engineer",
                    "case_insensitive": true
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Otherwise, if you are using a version below 7.10, then you need to modify your index mapping as shown below, and then reindex the data

{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "job_title": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "normalizer": "my_normalizer"
          }
        }
      }
    }
  }
}
0
votes

You can use match phrase in your 'must_not' clause to exclude only the exact phrase 'Software Engineer'.