1
votes

I am new to elasticsearch and I would like to provide a "search as you type" functionality. The text to be searched is no longer than 50 characters per field. The search should find all documents that contain the search text. Similar to a "wildcard term" à la '*query*'. But this is very cost-intensive.

That's why I have tried to do it according to the description of this article https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html. The only difference in my case is that I want to use the 'n-gram' analyzer instead of the 'edge n-gram' analyzer.

I have created the following custom analyzers:

     "settings": {
         "index": {
             "max_ngram_diff": "50",
             [...]
             "analysis": {
                 "filter": {
                     "3-50-grams-filter": {
                     "type": "ngram",
                     "min_gram": "3",
                     "max_gram": "50"
                  }
             },
             "analyzer": {
                 "index-3-50-grams-analyzer": {
                     "filter": [
                         "lowercase",
                         "3-50-grams-filter"
                     ],
                     "type": "custom",
                     "tokenizer": "keyword"
                 },
                 "search-3-50-grams-analyzer": {
                     "filter": [
                         "lowercase"
                     ],
                     "type": "custom",
                     "tokenizer": "keyword"
              }

I have created the following mapping:

"mappings": {
    dynamic": "strict",
    properties": {
        "my-field": {
                "type": "text",
                "fields": {
                    "my-field": {
                        "type": "text",
                        "analyzer": "index-3-50-grams-analyzer",
                        "search_analyzer": "search-3-50-grams-analyzer"
                    },
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },

Post following data:

{
    "my-field": "1107811#1OMAH0RN03D2"
}

Sending following to the Analyze-API:

{
    "text" : "1107811#1OMAH0RN03D2",
    "field" : "my-field"
}

Gets following result:

{
    "tokens": [
        {
            "token": "1107811",
            "start_offset": 0,
            "end_offset": 7,
            "type": "<NUM>",
            "position": 0
        },
        {
            "token": "1omah0rn03d2",
            "start_offset": 8,
            "end_offset": 20,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}
  1. It seems that the search_analyzer (although defined in the field mapping) does not work automatically
  2. Even if I specify the search_analyzer in the query, I do not get the expected results.

A query like that finds the document:

"query": {
    "match": {
        "my-field": {
            "query": "1OMAH0RN03D2"
        }
    }
}

...but a query like that does not (just removed the first character):

"query": {
    "match": {
        "my-field": {
            "query": "OMAH0RN03D2"
        }
    }
}

...and a query with explicit search_analyzer does also not (if I remove one more character):

"query": {
    "match": {
        "my-field": {
            "query": "MAH0RN03D2",
            "analyzer": "search-3-50-grams-analyzer"
        }
    }
}

Does anyone have any idea what might be causing this behavior?

1
Its working for me, looks like you are missing something, please refer my answer and let me know if you have further questionsuser156327
@OpsterElasticsearchNinja : Thank you for your example. So I had the possibility to compare it with my configuration/mapping and this way I could actually find the error. In my mapping the field is defined as a 'multi field' and the analyzer definitions are assigned a level deeper. This means that I had to adjust my search in the following way to make it work: { "query": { "match": { "my-field.my-field": { query": "1#1OMAH0RN"} } } }aahrendt

1 Answers

1
votes

Not sure, but I tried this with your sample document and index setting and it worked fine for me, below is the step by step thing which I did.

Index Mapping and setting

{
    "settings": {
        "index": {
            "max_ngram_diff": "50",
            "analysis": {
                "filter": {
                    "3-50-grams-filter": {
                        "type": "ngram",
                        "min_gram": "3",
                        "max_gram": "50"
                    }
                },
                "analyzer": {
                    "index-3-50-grams-analyzer": {
                        "filter": [
                            "lowercase",
                            "3-50-grams-filter"
                        ],
                        "type": "custom",
                        "tokenizer": "keyword"
                    },
                    "search-3-50-grams-analyzer": {
                        "filter": [
                            "lowercase"
                        ],
                        "type": "custom",
                        "tokenizer": "standard"
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "myfield": {
                "type": "text",
                "analyzer": "index-3-50-grams-analyzer",
                "search_analyzer": "search-3-50-grams-analyzer"
            }
        }
    }
}

Index sample doc

{
  "myfield" : "1107811#1OMAH0RN03D2"
}

Search query

{
    "query": {
        "match": {
            "myfield": {
                "query": "OMAH0RN03D2"
            }
        }
    }
}

Search result

  "hits": [
      {
        "_index": "edgesearch",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.4848835,
        "_source": {
          "myfield": "1107811#1OMAH0RN03D2"
        }
      }
    ]

Edit: Based on the comments, OP was using the multi field and the analyzer definitions are assigned a level deeper which caused the issue and including this information in the query solved issue.