4
votes

I want to have a search engine for my users. Let's say user class is:

public class User
{
    public string Code { get; set; }
    public string Name { get; set; }
}

I have such users in my db:

(1) new User { Code = "XW1234", Name = "John Doe" }, 
(2) new User { Code = "AD4567", Name = "Jane Doe" }

So: When my query is: "doe" (mind the lowercase) I want to see (1) and (2) When my query is: "4" I want to see (1) and (2) When my query is: "x" I want to see (1) When my query is: "ja" I want to see (2) I want to work similarly as like %doe% in SQL. Please don't mind queries length - I will use 3 letters minimum. This is just an example.

I have a solution with wildcards - works but performance is poor.

I was trying to configure index to use ngram tokenizer but no success - I was receiving an empty collection.

I also checked this ("starts with" approach): https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_time_search_as_you_type.html No success.

Please provide the C# code. I don't know if I'm translating the Elastic search jsons correctly.

EDIT According to the first comment I tried this:

private const string DefaultIndexName = "test";
private const string ElasticSearchServerUri = @"http://192.168.99.100:32769";

private static readonly IndexName UsersIndexName = "users";

public IElasticClient CreateElasticClient()
{
    var settings = CreateConnectionSettings();

    var client = new ElasticClient(settings);

    var studentsIndexDescriptor = new CreateIndexDescriptor(UsersIndexName)
        .Mappings(ms => ms
            .Map<User>(m => m
                .Properties(ps => ps
                    .String(s => s
                        .Name(n => n.Code)
                        .Analyzer("substring_analyzer")))));
    client.CreateIndex(UsersIndexName, descriptor => studentsIndexDescriptor
        .Settings(s => s
            .Analysis(a => a
                .Analyzers(analyzer => analyzer
                    .Custom("substring_analyzer", analyzerDescriptor => analyzerDescriptor.Tokenizer("standard").Filters("lowercase", "substring")))
                .TokenFilters(tf => tf
                    .NGram("substring", filterDescriptor => filterDescriptor.MinGram(2).MaxGram(15))))));

    return client;
}

private static ConnectionSettings CreateConnectionSettings()
{
    var uri = new Uri(ElasticSearchServerUri);
    var settings = new ConnectionSettings(uri);
    settings
        .DefaultIndex(DefaultIndexName);

    return settings;
}

And I used this query:

public IEnumerable<User> Search(string query)
{
    var result = elasticClient.Search<User>(descriptor => descriptor
        .Query(q => q
            .QueryString(queryDescriptor => queryDescriptor.Query(query).Fields(fs => fs.Fields(f1 => f1.Code)))));
    return result.Documents;
}

Didn't work.

I tried Codes: "1234" and "5678". I tried to query with "23", "5" - no results. When I search for "1234" - it returns the correct user.

1
You need to use an ngram token filter. This answer might help: stackoverflow.com/questions/34331249/… - Val
"Please provide the C# code". What have you tried so far? Can you show your attempt with an NGram tokenizer / NGram token filter? - Russ Cam
@RussCam - I added an example - pinus.acer

1 Answers

5
votes

I suspect within your code:

  1. When indexing the users, you do not specify the users index and so the users are indexed into the default index.
  2. When searching, you do not specify the users index and so will be querying against the default index, test. This index contains the indexed documents, however the code field is not analyzed with the substring_analyzer because this analysis is defined in the users index.

NEST provides a configuration option on ConnectionSettings, .InferMappingFor<T>(), to associate a particualr POCO type with a particular index name; this index will be used if one is not specified on the request and takes precedence over the default index.

var uri = new Uri(ElasticSearchServerUri);
var settings = new ConnectionSettings(uri)
    .DefaultIndex(DefaultIndexName)
    .InferMappingFor<User>(d => d
        .IndexName(UsersIndexName)
    );

The rest of of your code is correct. Here's a complete working example

private const string DefaultIndexName = "test";
private const string ElasticSearchServerUri = @"http://localhost:9200";
private const string UsersIndexName = "users";

void Main()
{
    var client = CreateElasticClient();

    var users = new[] {
        new User { Code = "XW1234", Name = "John Doe" },
        new User { Code = "AD4567", Name = "Jane Doe" }
    };

    client.IndexMany(users);

    // refresh the index after indexing so that the documents are immediately 
    // available for search. This is good for testing, 
    // but avoid doing it in production.
    client.Refresh(UsersIndexName);

    var result = client.Search<User>(descriptor => descriptor
        .Query(q => q
            .QueryString(queryDescriptor => queryDescriptor
                .Query("1234")
                .Fields(fs => fs
                    .Fields(f1 => f1.Code)
                )
            )
        )
    );

    // outputs 1
    Console.WriteLine(result.Total);
}

public class User
{
    public string Code { get; set; }
    public string Name { get; set; }
}

public IElasticClient CreateElasticClient()
{
    var settings = CreateConnectionSettings();
    var client = new ElasticClient(settings);

    // this is here so that the example can be re-run.
    // Remove this!
    if (client.IndexExists(UsersIndexName).Exists)
    {
        client.DeleteIndex(UsersIndexName);
    }

    client.CreateIndex(UsersIndexName, descriptor => descriptor
        .Mappings(ms => ms
            .Map<User>(m => m
                .AutoMap()
                .Properties(ps => ps
                    .String(s => s
                        .Name(n => n.Code)
                        .Analyzer("substring_analyzer")
                    )
                )
            )
        )
        .Settings(s => s
            .Analysis(a => a
                .Analyzers(analyzer => analyzer
                    .Custom("substring_analyzer", analyzerDescriptor => analyzerDescriptor
                        .Tokenizer("standard")
                        .Filters("lowercase", "substring")
                    )
                )
                .TokenFilters(tf => tf
                    .NGram("substring", filterDescriptor => filterDescriptor
                        .MinGram(2)
                        .MaxGram(15)
                    )
                )
            )
        )
    );

    return client;
}

private static ConnectionSettings CreateConnectionSettings()
{
    var uri = new Uri(ElasticSearchServerUri);
    var settings = new ConnectionSettings(uri)
        .DefaultIndex(DefaultIndexName)
        .InferMappingFor<User>(d => d
            .IndexName(UsersIndexName)
        );

    return settings;
}