0
votes

Using Elasticsearch and Nest 2.x

Based on some (crazy) user requirements I need to copy all searchable fields to a single field, lowercase it, and ignore spaces. When the user types in something to search, I lowercase it and remove spaces to use as the search string.

As an example: "The quick brown fox"... in elastic search I want this to be "thequickbrownfox" for search purposes.

The following searches should match the above document:

  • the
  • thequick
  • t
  • rown
  • nf

Here's how I'm building the index:

var customerSearchIdxDesc = new CreateIndexDescriptor(Constants.ElasticSearch.CustomerSearchIndexName)
    .Settings(f =>
        f.Analysis(analysis => analysis
                .Analyzers(analyzers => analyzers
                    .Custom(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram, a => a
                        .Filters("lowercase")
                        .Tokenizer(Constants.ElasticSearch.TokenizerNames.NoWhitespaceNGram)))
                .Tokenizers(tokenizers => tokenizers
                        .NGram(Constants.ElasticSearch.TokenizerNames.NoWhitespaceNGram, t => t
                            .MinGram(1)
                            .MaxGram(500)
                            .TokenChars(TokenChar.Digit, TokenChar.Letter, TokenChar.Punctuation, TokenChar.Symbol)
                        )
                )
        )
    )
    .Mappings(ms => ms.Map<ServiceModel.DtoTypes.Customer.SearchResult>(m => m
        .AutoMap()
        .Properties(p => p
            .String(n => n.Name(c => c.CustomerName).CopyTo(f =>
            {
                return new FieldsDescriptor<string>().Field("search");
            }).Index(FieldIndexOption.Analyzed).Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
            .String(n => n.Name(c => c.ContactName)
                        .CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
            .String(n => n.Name(c => c.CustomerName)
                        .CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
            .String(n => n.Name(c => c.City)
                        .CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
            .String(n => n.Name(c => c.StateAbbreviation)
                        .CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
            .String(n => n.Name(c => c.Country)
                        .CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
            .String(n => n.Name(c => c.PostalCode)
                        .CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
            .String(n => n.Name(Constants.ElasticSearch.CombinedSearchFieldName)
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
            )
        )
    );

As you can see I'm using the lowercase filter on the analyzer, and using TokenChars so whitespace is ommitted (well, that's the idea, it isn't working).

Here's what I'm using to search:

var response = client.Search<DtoTypes.Customer.SearchResult>(s =>
    s.From(0)
    .Take(Constants.ElasticSearch.MaxResults)
    .Query(q => q
        .MatchPhrase(mp => mp
            .Field(Constants.ElasticSearch.CombinedSearchFieldName)
            .Query(query))));

So here's the issues:

  • White space does not appear to be omitted (it looks like its only matching on words)
  • Partial matching appears to only work on the suffix. E.G. searching "aby" will not match "abyss", but "yss" will.
  • Searching across words isn't working "the quick"... searching "theq" matches nothing.
1
The _all field may help you here - elastic.co/guide/en/elasticsearch/reference/current/…. You can also set up a your own analysis on it tooRuss Cam
Additionally, you can map fields as multi_fields, apply different analysis to the sub-fields and then combine queries across those fields with bool queries, using features such as boosting, rescoring, etc. to control relevancyRuss Cam
I may be wrong, but I would have thought it would be more performant to use CopyTo, so that when searching it only has to look at one field.Chris Klepeis
querying against one field is likely faster but there are some queries that cannot be expressed by only querying one field, but by combining multiple queries across many different fields. I'm not suggesting that this is necessary here for your question, just wanted to highlight multi_field in case it may help with other scenariosRuss Cam

1 Answers

0
votes

I believe this solves my issues... by adding a character filter, adding it to the analyzer and then using an EdgeNgram tokenizer... no idea if this is the optimal setup, but it appears to work.

var customerSearchIdxDesc = new CreateIndexDescriptor(Constants.ElasticSearch.CustomerSearchIndexName)
    .Settings(f =>
        f.Analysis(analysis => analysis
                .CharFilters(cf => cf
                    .PatternReplace(Constants.ElasticSearch.FilterNames.RemoveWhitespace, pr => pr
                        .Pattern(" ")
                        .Replacement(string.Empty)
                    )
                )
                .Analyzers(analyzers => analyzers
                    .Custom(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer, a => a
                        .Filters("lowercase")
                        .CharFilters(Constants.ElasticSearch.FilterNames.RemoveWhitespace)
                        .Tokenizer(Constants.ElasticSearch.TokenizerNames.DefaultTokenizer)
                    )
                )
                .Tokenizers(tokenizers => tokenizers
                        .EdgeNGram(Constants.ElasticSearch.TokenizerNames.DefaultTokenizer, t => t
                            .MinGram(1)
                            .MaxGram(500)
                        )
                )
        )
    )
    .Mappings(ms => ms.Map<ServiceModel.DtoTypes.Customer.SearchResult>(m => m
        .AutoMap()
        .Properties(p => p
            .String(n => n.Name(c => c.CustomerName).CopyTo(f =>
            {
                return new FieldsDescriptor<string>().Field("search");
            }).Index(FieldIndexOption.Analyzed).Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
            .String(n => n.Name(c => c.ContactName)
                        .CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
            .String(n => n.Name(c => c.CustomerName)
                        .CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
            .String(n => n.Name(c => c.City)
                        .CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
            .String(n => n.Name(c => c.StateAbbreviation)
                        .CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
            .String(n => n.Name(c => c.Country)
                        .CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
            .String(n => n.Name(c => c.PostalCode)
                        .CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
            .String(n => n.Name(Constants.ElasticSearch.CombinedSearchFieldName)
                        .Index(FieldIndexOption.Analyzed)
                        .Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
            )
        )
    );