Elasticsearch English stemming not working correctly

Question

I've added an english stemmer analyzer and filter to our query but it doesn't seem to be working correctly with plurals stemming from 'y' => 'ies'. For example, when I search 'raspberry' the results never include 'raspberries' and so on. I've tried both english and minimal_english but I still get the same result.

Here's the analyzer and settings:

   analysis: {
     analyzer: {
       custom_analyzer: {
         type: "custom",
         tokenizer: "standard",
         filter: ["lowercase", "english_stemmer"],
       },
     },
     filter: {
       english_stemmer: {
         type: "stemmer",
         language: "english",
       },
     },
   },
 }

What am I doing wrong?

I hope you are using same anlyzer at both index time and search time. — Nishant
@Opster ES Ninja Nishan I thought it does that by default? How does one check? — Bender Rodriguez

Nishant Nishant · Accepted Answer · 2020-12-09T04:14:42

Though english should work for the e.g. you mentioned, you can even go for porter_stem instead. This is equivalent to stemmer with language english.

porter_stem in action:

POST /_analyze
{
  "tokenizer": "standard",
  "filter": ["porter_stem"],
  "text": ["raspberry", "raspberries"]
}

Response of above request:

{
  "tokens" : [
    {
      "token" : "raspberri",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "raspberri",
      "start_offset" : 10,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 101
    }
  ]
}

You can see both raspberry and raspberries get tokenise to raspberri. Therefore searching for raspberry will also match raspberries and vice-versa.

Make sure that the field against which you are indexing and searching has defined the analyzer as custom_analyzer (according to settings you stated in your question).

Working e.g.

Mapping:

PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "english_stemmer"
          ]
        }
      },
      "filter": {
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "field1": {
        "type": "text",
        "analyzer": "custom_analyzer"
      }
    }
  }
}

Indexing:

PUT test/_doc/1
{
  "field1": "raspberries"
}

PUT test/_doc/2
{
  "field1": "raspberry"
}

Search:

GET test/_search
{
  "query": {
    "match": {
      "field1": {
        "query": "raspberry"
      }
    }
  }
}

Response:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.18232156,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.18232156,
        "_source" : {
          "field1" : "raspberries"
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.18232156,
        "_source" : {
          "field1" : "raspberry"
        }
      }
    ]
  }
}

You can also have a look at other stemmer kstem.

Elasticsearch English stemming not working correctly

2 Answers

Working e.g.