I built an ElasticSearch index using a custom analyzer which uses lowercase
and custom word_delimiter
filter with keyword
tokenizer.
"merged_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"asciifolding",
"word_delim",
"trim"
]
},
"merged_search_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"asciifolding"
]
}
"word_delim": {
"type": "word_delimiter",
"catenate_words": true,
"generate_word_parts": false,
"generate_number_parts": false,
"preserve_original": true
}
"properties": {
"lastName": {
"type": "keyword",
"normalizer": "keyword_normalizer",
"fields": {
"merged": {
"type": "text",
"analyzer": "merged_analyzer",
"search_analyzer": "merged_search_analyzer"
}
}
}
}
Then I tried searching for documents containing dash-separated sub-words, e.g. 'Abc-Xyz'
. using the .merged
field. Both 'abc-xyz'
and 'abcxyz'
(in lowercase) match, it's exactly what I expected, but I want my analyzer matchs also with uppercase letters or whitespace (e.g. 'Abc-Xyz'
, 'abc-xyz '
).
It seems like the filters trim
and lowercase
have no effect on my analyzer
Any idea what I could be doing wrong?
I use elastic 6.2.4