0
votes

I'm new to ElasticSearch. Now I have a requirement that need to return all result which contains the keyword.

public Class People(){
    public string UserId {get; set;}
    public string FirstName {get; set;}
    public string LastName {get; set;}
}

I want to filter all People if one of three fileds contains the keyword, similar to like "%keyword%".

For example,I have a People

var people = new People() {
    UserId = "lastname.middlename.firstname",
    FirstName = "firstname",
    LastName = "lastname"
}

How I could get this Peoplle by searching the keyword ddl, How to setup the index and how to query.

I have tried to query with NEST like below

  var keyword = "ddl"
  var result = await _client.SearchAsync<People>(s => 
      s.Query(q => q.MultiMatch(m => m.Fields(f => f.Field(ff => ff.UserId).Field(ff => ff.FirstName).Field(ff => ff.LastName)).Query(keyword)))
  );

It won't work. It only work when I changed the keyword to firstname or lastname or lastname.middlename.firstname

Is there any way to meet the requirement?

2

2 Answers

0
votes

The short answer is that you would want to configure an analyzer for each of the target fields that tokenizes terms into trigrams, probably using the ngram token filter with min_gram and max_gram set to 3. This analysis will generate a ddl token for middlename that would then match your query.

The longer answer is that you'll want to have a read about Analysis, and how to write and test analyzers with the .NET client. You may want to go through the example repository that builds a Nuget search application. It's a fairly involved walkthrough that goes through a number of concepts, including analysis.

0
votes

To search on parts of your fields, you should use an ngram tokenizer in your mapping.

It will tokenize your fields using windows of different size.

This should solve your problem, but you need to take care of several points :

  • It is likely that you will want to use this analyzer only at index time. Using this tokenizer both at indexation and search is likely to generate a LOT of irrelevant results.
  • Use a minimum window size (min_gram parameter) not to low. In your case 3 will work.
  • The size of your index can substantially grow.

Another solution, simpler to implement but usually not efficient is to use wildcards queries in query string. It is very similar to the LIKE operator in SQL.