0
votes

Can I test a custom elasticsearch analyzer/tokenizer without adding it to the index first ? Something like:

GET _analyze
{
  "tokenizer": {
        "my_custom_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": [
          "letter", "digit", "symbol"

          ]
        }
      },
  "text" : "this is a test"
}

I am able to test it by adding a new analyzer to the index first -

curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "tokenizer": "my_custom_tokenizer"
        }
      },
      "tokenizer": {
        "my_custom_tokenizer": {
          "type": "edgeNGram",
          "min_gram": 1,
          "max_gram": 30,
          "token_chars": [
          "letter", "digit", "symbol", "punctuation", "whitespace"

          ]
        }
      }
    }
  }
}
'

and then doing this -

curl -X POST "localhost:9200/my_index/_analyze" -H 'Content-Type: application/json' -d'
{
  "analyzer": "my_custom_analyzer",
  "text": "testing"
}
'

Can I avoid this 2 step process ?

1

1 Answers

3
votes

As far as I'm aware older versions of Elasticsearch such as 2.x won't support complex array/object analysis like this but newer versions such as 5.x and upwards definitely do.

You're almost there with your existing JSON request, just remove the "my_custom_tokenizer" object whilst keeping its current configuration like so:

{
  "tokenizer" : {
    "type": "edge_ngram", 
    "min_gram": 2, 
    "max_gram": 10, 
    "token_chars": ["letter", "digit", "symbol"]
  },

  "text" : "this is a test"
}