Using Elasticsearch and Nest 2.x
Based on some (crazy) user requirements I need to copy all searchable fields to a single field, lowercase it, and ignore spaces. When the user types in something to search, I lowercase it and remove spaces to use as the search string.
As an example: "The quick brown fox"... in elastic search I want this to be "thequickbrownfox" for search purposes.
The following searches should match the above document:
- the
- thequick
- t
- rown
- nf
Here's how I'm building the index:
var customerSearchIdxDesc = new CreateIndexDescriptor(Constants.ElasticSearch.CustomerSearchIndexName)
.Settings(f =>
f.Analysis(analysis => analysis
.Analyzers(analyzers => analyzers
.Custom(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram, a => a
.Filters("lowercase")
.Tokenizer(Constants.ElasticSearch.TokenizerNames.NoWhitespaceNGram)))
.Tokenizers(tokenizers => tokenizers
.NGram(Constants.ElasticSearch.TokenizerNames.NoWhitespaceNGram, t => t
.MinGram(1)
.MaxGram(500)
.TokenChars(TokenChar.Digit, TokenChar.Letter, TokenChar.Punctuation, TokenChar.Symbol)
)
)
)
)
.Mappings(ms => ms.Map<ServiceModel.DtoTypes.Customer.SearchResult>(m => m
.AutoMap()
.Properties(p => p
.String(n => n.Name(c => c.CustomerName).CopyTo(f =>
{
return new FieldsDescriptor<string>().Field("search");
}).Index(FieldIndexOption.Analyzed).Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.ContactName)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.CustomerName)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.City)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.StateAbbreviation)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.Country)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.PostalCode)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(Constants.ElasticSearch.CombinedSearchFieldName)
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
)
)
);
As you can see I'm using the lowercase filter on the analyzer, and using TokenChars so whitespace is ommitted (well, that's the idea, it isn't working).
Here's what I'm using to search:
var response = client.Search<DtoTypes.Customer.SearchResult>(s =>
s.From(0)
.Take(Constants.ElasticSearch.MaxResults)
.Query(q => q
.MatchPhrase(mp => mp
.Field(Constants.ElasticSearch.CombinedSearchFieldName)
.Query(query))));
So here's the issues:
- White space does not appear to be omitted (it looks like its only matching on words)
- Partial matching appears to only work on the suffix. E.G. searching "aby" will not match "abyss", but "yss" will.
- Searching across words isn't working "the quick"... searching "theq" matches nothing.
_all
field may help you here - elastic.co/guide/en/elasticsearch/reference/current/…. You can also set up a your own analysis on it too – Russ Cammulti_fields
, apply different analysis to the sub-fields and then combine queries across those fields withbool
queries, using features such as boosting, rescoring, etc. to control relevancy – Russ Cammulti_field
in case it may help with other scenarios – Russ Cam