0
votes

I tried to apply a custom english analyzer, as well as the standard english analyzer in elasticsearch. My aim is especially to use stemming. So let's say I have following words in my documents: covers, impression.

Now, if I search for e.g. cover or impressive or impressions, I get 0 results. Only if I search for the exact terms "covers" or "impression" I will hit results.

This are my settings in elasticsearch (according to this documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html):

{
  "settings": {
    "analysis": {
      "filter": {
        "english_stop": {
          "type":       "stop",
          "stopwords":  "_english_" 
        },
        "english_stemmer": {
          "type":       "stemmer",
          "language":   "english"
        },
        "english_possessive_stemmer": {
          "type":       "stemmer",
          "language":   "possessive_english"
        }
      },
      "analyzer": {
        "rebuilt_english": {
          "tokenizer":  "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "english_stop",
            "english_stemmer"
          ]
        }
      }
    }
  }
}

My mapping looks as follows:

"mapping": {
  "_doc": {
     "properties": {
        "title": {"type": "text",
                   "analyzer": "rebuilt_english"},
        "description: {"type": text"
                       "analyzer": "rebuilt_english"}
  }
 }
}

I also tried (according to a few different tutorials) to change the settings like this (I just add the changes here, not the full code again):

{
  "settings": {
    "analysis": {
    "analyzer: "rebuilt_english" {
    "type": "custom",
     "filter": #and so on...

Do I miss something here? As far as I understand, I need to set the settings for a specific analyzer in "settings", give it a name and then use that name in "mapping" properties, so every item is analyzed according to the settings set above.

I also tried to not set any specific settings and just set the analyzer properties (in mapping) for each item like:

"title": {"type": "text",
"analyzer": "english"}

Which also doesn't work (even when using filters like stemming).

I really tried to find a solution for hours, but I can't get it to work. Help would be much appreciated. Thanks!

UPDATE

This is the code I used to create the index (my latest try, according to my description I also tried other ways to apply the method):

PUT /my_index

{
  "settings": {
    "analysis": {
      "analyzer": {
        "rebuilt_english": {
          "type": "custom",
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english"
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "english_possessive_stemmer": {
          "type": "stemmer",
          "language": "possessive_english"
        },
          "tokenizer": "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "english_stop",
            "english_stemmer"
            ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "title": { "type": "text",
          "analyzer": "rebuilt_english"
        },
        "description": { "type": "text",
                    "analyzer": "rebuilt_english"}
                    }
        }
      }
    }
}
3
Can you post actual index mapping? GET /index-name should return that. Maybe there was a mistake somewhereEvaldas Buinauskas
When I do this, I actually notice that there isn't any analyzer mapped to my items, even though I did map the analyzer when creating my index. Only the type is correctly mapped.runner2018
There you go. I think the issue is that you specified mappping, not mappings during index creation.Evaldas Buinauskas
I checked and actually used mappings.runner2018
I will update my question and post the code I used to create the index in the end of my post.runner2018

3 Answers

0
votes

Your issue was that you had your filter key, where you have all your named filters was in wrong place. It was placed within analyzer, but was supposed to be a sibling key to analyzer.

So my bet is that the following config should work as expected:

{
  "settings":{
    "analysis":{
      "filter":{
        "english_stop":{
          "type":"stop",
          "stopwords":"_english"
        },
        "english_stemmer":{
          "type":"stemmer",
          "language":"english"
        },
        "english_possessive_stemmer":{
          "type":"stemmer",
          "language":"possessive_english"
        }
      },
      "analyzer":{
        "rebuilt_english":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":[
            "english_possessive_stemmer",
            "lowercase",
            "english_stop",
            "english_stemmer"
          ]
        }
      }
    },
    "mappings":{
      "_doc":{
        "properties":{
          "title":{
            "type":"text",
            "analyzer":"rebuilt_english"
          },
          "description":{
            "type":"text",
            "analyzer":"rebuilt_english"
          }
        }
      }
    }
  }
}
0
votes
PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "english_stop": {
          "type":"standard",
          "stopwords": "_english_"
          },
          "my_analyzer": {
            "type":"custom",
            "tokenizer":"standard",
            "filter":["my_stemmer"]
          }
        },
        "filter": {
          "my_stemmer":{
            "type": "stemmer",
            "language": "english"
          }
        }
    }
  }
}

POST /my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "I'm in the mood for drinking semi-dry wine!"
}

I think this will help. Thanks.

0
votes

This below analyzer would work, fix is while you have defined "tokenizer":"standard" then don't define "type":"standard" field

PUT /analyzers_test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "my_stemmer",
            "lowercase"
          ]
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "stemmer",
          "name": "english"
        }
      }
    }
  }
}