47
votes

I've got a field in an ElasticSearch field which I do not want to have analyzed, i. e. it should be stored and compared verbatim. The values will contain letters, numbers, whitespace, dashes, slashes and maybe other characters.

If I do not give an analyzer in my mapping for this field, the default still uses a tokenizer which hacks my verbatim string into chunks of words. I don't want that.

Is there a super simple analyzer which, basically, does not analyze? Or is there a different way of denoting that this field shall not be analyzed?

I only create the index, I don't do anything else. I can use analyzers like "english" for other fields which seems to be built-in names for pre-configured analyzers. Is there a list of other names? Maybe there's one fitting my needs (namely doing nothing with the input).

This is my mapping currently:

{
  "my_type": {
    "properties": {
      "my_field1": { "type": "string", "analyzer": "english" },
      "my_field2": { "type": "string" }
    }
  }
}

my_field1 is language-dependent; this seems to work. my_field2 shall be verbatim. I'd like to give an analyzer there which simply does not do anything.

A sample value for my_field2 would be "B45c 14/04".

3

3 Answers

58
votes
"my_field2": {
    "properties": {
        "title": {
            "type": "string",
            "index": "not_analyzed"
        }
    }
}

Check you here, https://www.elastic.co/guide/en/elasticsearch/reference/1.4/mapping-core-types.html, for further info.

49
votes

This is no longer true due to the removal of the string (replaced by keyword and text) type as described here. Instead you should use keyword type with "index": true | false.

For Example OLD:

{
  "foo": {
    "type" "string",
    "index": "not_analyzed"
  }
}

becomes NEW:

{
  "foo": {
    "type" "keyword",
    "index": true
  }
}

This means the field is indexed but as it is typed as keyword not analyzed implicitly. If you would like to have the field analyzed, you need to use text type.

3
votes

keyword analyser can be also used.

// don't actually use this, use "index": "not_analyzed" instead
{
  "my_type": {
    "properties": {
      "my_field1": { "type": "string", "analyzer": "english" },
      "my_field2": { "type": "string", "analyzer": "keyword" }
    }
  }
}

As noted here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-keyword-analyzer.html, it makes more sense to mark those fields as not_analyzed.

But keyword analyzer can be useful when it is set by default for whole index.

UPDATE: As it said in comments, string is no longer supported in 5.X