Create keyword string type with custom analyzer in 5.3.0

Question

I have a string I'd like to index as keyword type but with a special comma analyzer: For example:

"San Francisco, Boston, New York" -> "San Francisco", "Boston, "New York"

should be both indexed and aggregatable at the same time so that I can split it up by buckets. In pre 5.0.0 the following worked: Index settings:

{
     'settings': {
         'analysis': {
             'tokenizer': {
                 'comma': {
                     'type': 'pattern',
                     'pattern': ','
                 }
             },
             'analyzer': {
                'comma': {
                     'type': 'custom',
                     'tokenizer': 'comma'
                 }
             }
         },
     },
}

with the following mapping:

{
    'city': {
        'type': 'string',
        'analyzer': 'comma'
    },
}

Now in 5.3.0 and above the analyzer is no longer a valid property for the keyword type, and my understanding is that I want a keyword type here. How do I specify an aggregatable, indexed, searchable text type with custom analyzer?

In 5.2 and above, keyword fields can now have normalizers, but those only allow specific token filters and char filters, but no tokenizer, so that's not an approach. Do you have any way to split that string on the client side before sending it to ES? — Val

user3775217 user3775217 · Accepted Answer · 2017-04-18T05:09:06

You can use multifields to index the same fields in two different ways one for searching and other for aggregations.

Also i suugest you to add a filter for trim and lowercase the tokens produced to help you with better search.

Mappings

PUT commaindex2
    {
        "settings": {
            "analysis": {
                "tokenizer": {
                    "comma": {
                        "type": "pattern",
                        "pattern": ","
                    }
                },
                "analyzer": {
                    "comma": {
                        "type": "custom",
                        "tokenizer": "comma",
                        "filter": ["lowercase", "trim"]
                    }
                }
            }
        },
        "mappings": {
            "city_document": {
                "properties": {
                    "city": {
                        "type": "keyword",
                        "fields": {
                            "city_custom_analyzed": {
                                "type": "text",
                                "analyzer": "comma",
                                "fielddata": true
                            }
                        }
                    }
                }
            }
        }
    }

Index Document

POST commaindex2/city_document
{
  "city" : "san fransisco, new york, london"
}

Search Query

POST commaindex2/city_document/_search
{
    "query": {
        "bool": {
            "must": [{
                "term": {
                    "city.city_custom_analyzed": {
                        "value": "new york"
                    }
                }
            }]
        }
    },
    "aggs": {
        "terms_agg": {
            "terms": {
                "field": "city",
                "size": 10
            }
        }
    }
}

Note

In case you want to run aggs on indexed fields, like you want to count for each city in buckets, you can run terms aggregation on city.city_custom_analyzed field.

POST commaindex2/city_document/_search
{
    "query": {
        "bool": {
            "must": [{
                "term": {
                    "city.city_custom_analyzed": {
                        "value": "new york"
                    }
                }
            }]
        }
    },
    "aggs": {
        "terms_agg": {
            "terms": {
                "field": "city.city_custom_analyzed",
                "size": 10
            }
        }
    }
}

Hope this helps

Create keyword string type with custom analyzer in 5.3.0

2 Answers