Elasticsearch word_delimiter filter with uppercase token dont match

Question

I built an ElasticSearch index using a custom analyzer which uses lowercase and custom word_delimiter filter with keyword tokenizer.

"merged_analyzer": {
   "type": "custom",
   "tokenizer": "keyword",
   "filter": [
     "lowercase",
     "asciifolding",
     "word_delim",
     "trim"
   ]
},
"merged_search_analyzer": {
    "type": "custom",
    "tokenizer": "keyword",
    "filter": [
      "lowercase",
      "asciifolding"
    ]
}

"word_delim": {
   "type": "word_delimiter",
   "catenate_words": true,
   "generate_word_parts": false,
   "generate_number_parts": false,
   "preserve_original": true
}

"properties": {
  "lastName": {
    "type": "keyword",
    "normalizer": "keyword_normalizer",
    "fields": {
      "merged": {
        "type": "text",
        "analyzer": "merged_analyzer",
        "search_analyzer": "merged_search_analyzer"
      }
    }
  }
}

Then I tried searching for documents containing dash-separated sub-words, e.g. 'Abc-Xyz'. using the .merged field. Both 'abc-xyz' and 'abcxyz' (in lowercase) match, it's exactly what I expected, but I want my analyzer matchs also with uppercase letters or whitespace (e.g. 'Abc-Xyz', 'abc-xyz ').

It seems like the filters trim and lowercase have no effect on my analyzer

Any idea what I could be doing wrong?

I use elastic 6.2.4

Opster Elasticsearch Expert Opster Elasticsearch Expert · Accepted Answer · 2019-02-11T06:59:35

I'm not sure, but it might be that the search analyzer is different from the index analyzer. there are two things you can do to check this.

configure a search_analyzer: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/search-analyzer.html which would analyze using your merged_analyzer.
use the Analyze API: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/indices-analyze.html in order to check if your search tokens are as expected.

Elasticsearch word_delimiter filter with uppercase token dont match

1 Answers