ElasticSearch 2.x attribute mapping ignores "NotAnalyzed"

Question

I'm trying to upgrade my application from ElasticSearch Nest 1.7 to 2.4 and the attribute based mapping looks like it should work, but it doesn't (completely). I have a model class like this:

[DataContract]
[ElasticsearchType(IdProperty = "Id")]
public class Series
{
    [DataMember]
    [String(Index = FieldIndexOption.Analyzed, Analyzer = "custom_en")]
    public string Description { get; set; }

    [DataMember]
    [String(Index = FieldIndexOption.NotAnalyzed)]
    public HashSet<Role> ReleasableTo { get; set; }
}

The equivalent declaration in Nest 1.x was working, and my term query against the field returned the results I was expecting. When I received no results, I checked the mapping, and to my surprise the Index = FieldIndexOption.NotAnalyzed was not respected. My generated mapping was something like this:

"properties" : {
    "description" : {
        "type": "string"
    }
    "releasableTo" : {
        "type": "string"
    }
}

So neither the field that I had a custom analyzer set was marked properly, nor the the field I needed not to be analyzed was marked properly.

This is the code I used to call initialize everything:

            var indexDescriptor = new CreateIndexDescriptor(DefaultIndex)
                .Mappings(ms => ms
                    .Map<Series>(m => m.AutoMap())
                )
            );

            indexDescriptor.Settings(s => s
                .NumberOfShards(3)
                .NumberOfReplicas(2)
                .Analysis(a => a
                    .CharFilters(c => c.Mapping("&_to_and", mf => mf.Mappings( "&=> and ")))
                    .TokenFilters(t => t.Stop("en_stopwords", tf=>tf.StopWords(new StopWords(stopwords)).IgnoreCase()))
                    .Analyzers(z => z
                        .Custom("custom_en", ca => ca
                            .CharFilters("html_strip", "&_to_and")
                            .Tokenizer("standard")
                            .Filters("lowercase", "en_stopwords")
                        )
                    )
                )
            );

            client.CreateIndex(indexDescriptor);

NOTE: client is the elasticsearch client.

I know the DataContract attributes don't strictly apply for ElasticSearch, but I also need to serialize these objects to disk for processing. With Nest 1.x there was no conflict, so it didn't cause any problems.

I'm not concerned about the analyzer creation. I'm concerned that the mapping doesn't respect anything more specific than the type.

How do I get Nest 2.x to respect the additional information in the attributes so I don't have to manually map them when declaring the mappings?

So it turns out that the problem with the mapping had to do with other types that were mapped at the same time. There was an invalid response from the index that I didn't catch. It was very frustrating to work through, but the mapping is working correctly now.

Russ Cam Russ Cam · Accepted Answer · 2016-12-01T22:44:29

I'm not sure if it's a typo, but your type with the attributes is Series but you're mapping a type Service.

I can't reproduce what you're seeing, with NEST 2.5.0. Here's a complete example

void Main()
{
    var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
    var defaultIndex = "default-index";
    var connectionSettings = new ConnectionSettings(pool, new InMemoryConnection())
            .DefaultIndex(defaultIndex)
            .PrettyJson()
            .DisableDirectStreaming()
            .OnRequestCompleted(response =>
                {
                    // log out the request
                    if (response.RequestBodyInBytes != null)
                    {
                        Console.WriteLine(
                            $"{response.HttpMethod} {response.Uri} \n" +
                            $"{Encoding.UTF8.GetString(response.RequestBodyInBytes)}");
                    }
                    else
                    {
                        Console.WriteLine($"{response.HttpMethod} {response.Uri}");
                    }

                    Console.WriteLine();

                    // log out the response
                    if (response.ResponseBodyInBytes != null)
                    {
                        Console.WriteLine($"Status: {response.HttpStatusCode}\n" +
                                 $"{Encoding.UTF8.GetString(response.ResponseBodyInBytes)}\n" +
                                 $"{new string('-', 30)}\n");
                    }
                    else
                    {
                        Console.WriteLine($"Status: {response.HttpStatusCode}\n" +
                                 $"{new string('-', 30)}\n");
                    }
                });

    var client = new ElasticClient(connectionSettings);

    var stopwords = "stopwords";

    var indexDescriptor = new CreateIndexDescriptor(defaultIndex)
        .Mappings(ms => ms
            .Map<Series>(m => m.AutoMap())
        );

    indexDescriptor.Settings(s => s
        .NumberOfShards(3)
        .NumberOfReplicas(2)
        .Analysis(a => a
            .CharFilters(c => c.Mapping("&_to_and", mf => mf.Mappings("&=> and ")))
            .TokenFilters(t => t.Stop("en_stopwords", tf => tf.StopWords(new StopWords(stopwords)).IgnoreCase()))
            .Analyzers(z => z
                .Custom("custom_en", ca => ca
                    .CharFilters("html_strip", "&_to_and")
                    .Tokenizer("standard")
                    .Filters("lowercase", "en_stopwords")
                )
            )
        )
    );

    client.CreateIndex(indexDescriptor);

}

[DataContract]
[ElasticsearchType(IdProperty = "Id")]
public class Series
{
    [DataMember]
    [String(Index = FieldIndexOption.Analyzed, Analyzer = "custom_en")]
    public string Description { get; set; }

    [DataMember]
    [String(Index = FieldIndexOption.NotAnalyzed)]
    public HashSet<Role> ReleasableTo { get; set; }
}

This uses InMemoryConnection, so no requests are made to Elasticsearch (this can be removed to actually send the request). The creation index request looks like

{
  "settings": {
    "index.number_of_replicas": 2,
    "index.number_of_shards": 3,
    "analysis": {
      "analyzer": {
        "custom_en": {
          "type": "custom",
          "char_filter": [
            "html_strip",
            "&_to_and"
          ],
          "filter": [
            "lowercase",
            "en_stopwords"
          ],
          "tokenizer": "standard"
        }
      },
      "char_filter": {
        "&_to_and": {
          "type": "mapping",
          "mappings": [
            "&=> and "
          ]
        }
      },
      "filter": {
        "en_stopwords": {
          "type": "stop",
          "stopwords": "stopwords",
          "ignore_case": true
        }
      }
    }
  },
  "mappings": {
    "series": {
      "properties": {
        "description": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "custom_en"
        },
        "releasableTo": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

which has the respective property mappings. Bear in mind that if the index already exists, then the mapping change will not be applied, so you would need to delete and create the index in this case.

ElasticSearch 2.x attribute mapping ignores "NotAnalyzed"

1 Answers