1
votes

I am using ElasticSearch v6.8 and the NEST client, writing C# code. I'm using fluent mapping.

I am indexing an email field, so users can be found by searching their email address. The standard analyzer didn't work, and I then read up on using the uax_url_email tokenizer. I plugged it in, and it works better than the standard analyzer, but I still can't search using the @ character, or the '.' character. Example: type in "firstname" gets a match. Type in "firstname@" doesn't match. Type in "firstname.lastname" doesn't match either.

What am I doing wrong? I assumed the uax_url_email tokenizer would handle this. I switched to using NGram instead, and then it works, but it just seems strange that the existing built-in email analyzer doesn't handle the @ sign and similar.

Here's my field mapping (it's a plain string):

 .Map<UserSearchEntity>(
                        m => m
                            .AutoMap()
                            .Properties(p => p
                                .Text(t => t
                                    .Name(n => n.Email)
                                    .Analyzer("user_email_analyzer")))

The analyzer has been registered previously, with the uax_url_email tokenizer.

1

1 Answers

2
votes

Here is a simple app showing usage of uax_url_email tokenizer.

{
    var createIndexResponse = await client.CreateIndexAsync("my_index", c => c
        .Settings(s => s.Analysis(a => a
            .Analyzers(an => an.Custom("my_analyzer", cu => cu.Tokenizer("my_tokenizer")))
            .Tokenizers(t => t.UaxEmailUrl("my_tokenizer", u => u.MaxTokenLength(3)))))
        .Mappings(m => m
            .Map<Document>(map => map
                .Properties(p => p.Text(t => t.Name(n => n.Email).Analyzer("my_analyzer"))))));

    var indexResponse = await client.IndexAsync(new Document {Id = "1", Email = "[email protected]"},
        i => i.Refresh(Refresh.WaitFor));

    await Search(client, "robert.lyson");
    await Search(client, "robert");
    await Search(client, "lyson");
    await Search(client, "@domain.com");
    await Search(client, "domain.com");
    await Search(client, "rob");
}

private static async Task Search(ElasticClient client, string query)
{
    var searchResponse = await client.SearchAsync<Document>(s => s
        .Query(q => q.Match(m => m.Field(f => f.Email).Query(query))));

    System.Console.WriteLine($"result for query \"{query}\": {string.Join(",", searchResponse.Documents.Select(x => x.Email))}");
}

public class Document
{
    public string Id { get; set; }
    public string Email { get; set; }
}

output:

result for query "robert.lyson": [email protected]
result for query "robert": [email protected]
result for query "lyson": [email protected]
result for query "@domain.com": [email protected]
result for query "domain.com": [email protected]
result for query "rob": [email protected]

Tested with elasticsearch 6.8.0 and NEST 6.8.x.

Hope that helps.