There are similar questions asked here Elasticsearch Map case insensitive to not_analyzed documents, however mine is a slightly different because I deal with special characters.
Most people recommend using a keyword analyzer
combined with lowercase filter
. However, this does not work for my case because keyword analyzer tokenizes on spaces, and special characters like ^, #, etc
. which break the type of support I'm going for.
i.e.
^HELLOWORLD
should be matched by searching^helloworld
, but nothelloworld
#FooBar
should be matched by#foobar
but notfoobar
.Foo Bar
should be matched byfoo bar
, but notfoo
orbar
.
Similar functionality with what we see here https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html#_term_filter_with_numbers, but with case insensitivity.
Does anyone know how to accomplish this?
EDIT 1:
It seems the core of my problem was with multi-field, as keyword+lowercase seems to resolve the question posed in the title. However, it would be more accurate to pose this question for a multi-field value property.
test_mapping.json:
{
"properties" : {
"productID1" : {
"type" : "string",
"index_analyzer" : "keyword_lowercase",
"search_analyzer" : "keyword_lowercase"
},
"productID2" : {
"type": "multi_field",
"keyword_edge_ID": {
"type": "string",
"index_analyzer":"keyword_lowercase_edge",
"search_analyzer":"keyword_lowercase_edge"
},
"productID2": {
"type": "string",
"index": "analyzed",
"store": "yes",
"index_analyzer":"keyword_lowercase",
"search_analyzer":"keyword_lowercase"
}
}
}
}
test.json:
{
"index": {
"analysis": {
"filter":{
"edgengramfilter": {
"type": "edgeNgram",
"side": "front",
"min_gram": 1,
"max_gram": 32
}
},
"analyzer": {
"keyword_lowercase" : {
"type" : "custom",
"tokenizer": "keyword",
"filter": "lowercase"
},
"keyword_lowercase_edge": {
"tokenizer": "keyword",
"filter": ["lowercase", "edgengramfilter"]
}
}
}
}
}
Shell script to create index with mappings:
#!/bin/sh
ES_URL="http://localhost:9200"
curl -XDELETE $ES_URL/test
curl -XPOST $ES_URL/test/ --data-binary @test.json
curl -XPOST $ES_URL/test/query/_mapping --data-binary @test_mapping.json
POST localhost:9200/test/query
:
{
"productID1" : "^A",
"productID2" : "^A"
}
I'd like it so that I can match against productID2 with "^A", but it is returning no results right now, but it works when I do the same query against productID1. {"query": { "match": { "productID2": "^A" }}}