I am new to elasticsearch and I would like to provide a "search as you type" functionality. The text to be searched is no longer than 50 characters per field. The search should find all documents that contain the search text. Similar to a "wildcard term" à la '*query*'. But this is very cost-intensive.
That's why I have tried to do it according to the description of this article https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html. The only difference in my case is that I want to use the 'n-gram' analyzer instead of the 'edge n-gram' analyzer.
I have created the following custom analyzers:
"settings": {
"index": {
"max_ngram_diff": "50",
[...]
"analysis": {
"filter": {
"3-50-grams-filter": {
"type": "ngram",
"min_gram": "3",
"max_gram": "50"
}
},
"analyzer": {
"index-3-50-grams-analyzer": {
"filter": [
"lowercase",
"3-50-grams-filter"
],
"type": "custom",
"tokenizer": "keyword"
},
"search-3-50-grams-analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
}
I have created the following mapping:
"mappings": {
dynamic": "strict",
properties": {
"my-field": {
"type": "text",
"fields": {
"my-field": {
"type": "text",
"analyzer": "index-3-50-grams-analyzer",
"search_analyzer": "search-3-50-grams-analyzer"
},
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
Post following data:
{
"my-field": "1107811#1OMAH0RN03D2"
}
Sending following to the Analyze-API:
{
"text" : "1107811#1OMAH0RN03D2",
"field" : "my-field"
}
Gets following result:
{
"tokens": [
{
"token": "1107811",
"start_offset": 0,
"end_offset": 7,
"type": "<NUM>",
"position": 0
},
{
"token": "1omah0rn03d2",
"start_offset": 8,
"end_offset": 20,
"type": "<ALPHANUM>",
"position": 1
}
]
}
- It seems that the search_analyzer (although defined in the field mapping) does not work automatically
- Even if I specify the search_analyzer in the query, I do not get the expected results.
A query like that finds the document:
"query": {
"match": {
"my-field": {
"query": "1OMAH0RN03D2"
}
}
}
...but a query like that does not (just removed the first character):
"query": {
"match": {
"my-field": {
"query": "OMAH0RN03D2"
}
}
}
...and a query with explicit search_analyzer does also not (if I remove one more character):
"query": {
"match": {
"my-field": {
"query": "MAH0RN03D2",
"analyzer": "search-3-50-grams-analyzer"
}
}
}
Does anyone have any idea what might be causing this behavior?
{ "query": { "match": { "my-field.my-field": { query": "1#1OMAH0RN"} } } }
– aahrendt