1
votes

"Designing an index in Elasticsearch so that "&" and "and" in query returns same result"

How can we make Elasticsearch return same results whether the search was made with "and" as the query string or ampersand "&".

For example there's a query to look for all movie titles containing "and / &" in their name.

  1. Mr. & Mrs. Smith
  2. Jack and Jill
  3. Abc and Def & ghi
  4. Dummy Name

So in this case it shouldn't matter if the search is done with "and" or "&" in query should return 1,2,3.

Dump from my Kibana Dev Tool

PUT test_index { "settings": {"number_of_replicas": 0, "number_of_shards": 1 }, "mappings": { "doc": { "properties": { "movie_name":{"type":"text"} } } } }

PUT /test_index/doc/1 { "movie_name":"Mr. & Mrs. Smith" }

PUT /test_index/doc/2 { "movie_name":"Jack and Jill" }

PUT /test_index/doc/3 { "movie_name":"Abc and Def & ghi" }

PUT /test_index/doc/4 { "movie_name":"Dummy Name" }

Both the queries below should return the same result

  1. GET test_index/_search { "size": 20, "query": { "match": { "movie_name": "&" } } }

  2. GET test_index/_search { "size": 20, "query": { "match": { "movie_name": "and" } } }

2

2 Answers

3
votes

There are a couple of ways to do this

  1. Use the english analyzer which will remove the special characters from your text and also the stop words like "and", so in essence your searches will be matched against the tokens without &/and hence will be the same. See https://www.elastic.co/guide/en/elasticsearch/reference/6.4/analysis-lang-analyzer.html for help doc

  2. Keep your standard analyzer and add a character replacement filter to replace any occurrence of " & " pattern by " and ", then all those searches will emit the same tokens. See https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html for help doc

1
votes
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "keyword",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            "+ => plus",
            "& => and"
          ]
        }
      }
    }
  }
}

you should create char filter to do this.

When you create index with above analyzer

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "&"
}

output:

{
  "tokens": [
    {
      "token": "and",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    }
  ]
}

and if you run the above analyse without the above char filter analyzer you will get

{
  "tokens": []
}

Reason: In ES - the symbols are removed when you create field with Text type.