Multiple tokenizers inside one Custom Analyser in Elasticsearch

Question

I am using Custom NGRAM Analyzer which has a ngram tokenizer. I have also used lowercase filter. The query is working fine for searches without characters. But when I am searching for certain symbols, it fails. Since I have used lower case tokenizers, Elasticsearch doesn't analyse symbols. I know whitespace tokenizer can help me solve the issue. How can I use two tokenizers in a single analyzer?Below is the mapping:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer":"my_tokenizer",
          "filter":"lowercase"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3,
          "token_chars": [
            "letter", 
            "digit"
          ]
        }
      }
    }
  },
    "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }

}

Is there a way I could solve this issue?

Opster ES Ninja - Kamal Opster ES Ninja - Kamal · Accepted Answer · 2018-09-28T07:32:55

As per the documentation of elasticsearch,

An analyzer must have exactly one tokenizer.

However, you can have multiple analyzer defined in settings, and you can configure separate analyzer for each field.

If you want to have single field itself to be used using different analyzer, one of the option is to make that field multi-field as per this link

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "whitespace"
          "fields": {
            "ngram": { 
              "type":  "text",
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }
  }
}

So if you configure as above your query need to make use of title and title.ngram fields.

GET my_index/_search
{
  "query": {
    "multi_match": {
      "query": "search @#$ whatever",
      "fields": [ 
        "title",
        "title.ngram"
      ],
      "type": "most_fields" 
    }
  }
}

As another option, here is what you can do

Create two indexes.
The first index have field title with analyzer my_analyzer
Second index have field title with analyzer whitespace
Have same alias created for both of them as below

Execute the below:

POST _aliases
{  
   "actions":[  
      {  
         "add":{  
            "index":"index A",
            "alias":"index"
         }
      },
      {  
         "add":{  
            "index":"index B",
            "alias":"index"
         }
      }
   ]
}

So when you eventually write a query, it must be pointing to this alias which in turn would be querying multiple indexes.

Hope this helps!

Multiple tokenizers inside one Custom Analyser in Elasticsearch

3 Answers