0
votes

Is it possible to create custom elasticsearch analyser which can split index by space and then create two tokens? One, with everything before space and second, with everything. For example: I have stored record with field which has following text: '35 G'. Now I want to receive that record by typing only '35' or '35 G' query to that field. So elastic should create two tokens: ['35', '35 G'] and no more.

If it's possible, how to achieve it ?

1

1 Answers

1
votes

Doable using path_hierarchy tokenizer.

PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "path_hierarchy",
          "delimiter": " "
        }
      }
    }
  }
  ...
}

And now

POST test/_analyze
{
  "analyzer": "my_analyzer",
  "text": "35 G"
}

outputs

{
  "tokens": [
    {
      "token": "35",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "35 G",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    }
  ]
}