Matching with missing spaces in ElasticSearch

Question

I have documents that I want to index in ElasticSearch that contains a text field called name. I currently index the name using the snowball analyzer. However, I would like to match names both with and without included spaces. For example, a document with the name "The Home Depot" should match "homedepot", "home", and "home depot". Additionally, documents with a single word name like "ExxonMobil" should match "exxon mobil" and "exxonmobil".

I can't seem to find the right combination of analyzer/filters to accomplish this.

femtoRgon femtoRgon · Accepted Answer · 2013-11-18T22:54:49

I think the most direct approach to this problem would be to apply a Shingle token filter, which, instead of creating ngrams of characters, creates combinations of incoming tokens. You can add it to your analyzer something like:

filter:
    ........
    my_shingle_filter:
        type: shingle
        min_shingle_size: 2
        max_shingle_size: 3
        output_unigrams: true
        token_separator: ""

you should be mindful of where this filter is placed in your filter chain. It should probably come late in the chain, after all token separation/removal/replacement has already occurred (ie. after any StopFilters, SynonymFilters, stemmers, etc).

Matching with missing spaces in ElasticSearch

2 Answers