6
votes

Summary

Recently we upgraded to Spring Data Elasticsearch 4.x. Part of this major release meant that Jackson is no longer used to convert our domain objects to json (using MappingElasticsearchConverter instead) [1]. This means we are now forced to add a new id field to all our documents.

Previously we had domain objects like this:

import org.springframework.data.annotation.Id;

public ESDocument {
    @Id
    private String id;

    private String field1;

    @JsonIgnore
    public String getId() {
        return id;
    }

    public String getField1() {
        return field1;
    }

Which resulted in documents like this in ES:

{
  "_index" : "test_index",
  "_type" : "_doc",
  "_id" : "d5bf7b5c-7a44-42f9-94d6-d59fe3988482",
  "_score" : 1.0,
  "_source" : {
    "field1" : "blabla"
  }
}

Note that:

  1. The @JsonIgnore annotation used to ensure that we were not required to have a id field in the _source.
  2. We are setting the document id ourselves and it ends up in _id.

Problem

With Spring Data Elastic 4.x the @JsonIgnore annotation is no longer respected which means we are now forced to have an id field in the _source as shown below:

{
  "_index" : "test_index",
  "_type" : "_doc",
  "_id" : "d5bf7b5c-7a44-42f9-94d6-d59fe3988482",
  "_score" : 1.0,
  "_source" : {
    "id": "d5bf7b5c-7a44-42f9-94d6-d59fe3988482",
    "field1" : "blabla"
  }
}

Questions

  1. Is it no longer possible to omit the duplication of the identifier of the document (i.e. in the _id and id fields)? If so how? (Note we already tried @org.springframework.data.annotation.Transient which does not work because spring-data-elastic then thinks our document does not have an id).
  2. Was our previous approach of suppressing the id field in _source incorrect or problematic?

Versions

java: 1.8.0_252
elasticsearch: 7.6.2
spring-boot: 2.3.1.RELEASE
spring-data-elastic: 4.0.1.RELEASE

References

[1] - https://spring.io/blog/2020/05/27/what-s-new-in-spring-data-elasticsearch-4-0

1
id field will always be present in the document. However, if you don't want it then remove it from your mappings instead? or you can use queries to fetch documents and can exclude id field. Ref: elastic.co/guide/en/elasticsearch/reference/current/…Harshit
Thanks @Harshit. Like I said in my question, up until now we have not had a _source.id field in our documents. Thanks for the link.Oliver Henlich

1 Answers

4
votes

Question 1:

To omit the id field from the _source, you would normally use the @Transient annotation, but as you wrote, this does not work for the id property. Transient properties are ignored in Spring Data modules (not only Spring Data Elasticsearch).

But you you can use the org.springframework.data.annotation.ReadOnlyProperty annotation for this:

@Id
@ReadOnlyProperty
private String id;

To be honest, I didn't know up to now that this exists, this comes from Spring Data Commons as well and is checked in the isWriteable() method of the property when properties are written by the MappingElasticsearchConverter .

Question 2:

Surely not incorrect, but problematic as you found out. We always consider the whole entity when storing it, so we never thought about not writing the id. Strictly speaking, it is not necessary, there you're right, because we always get the id back in the _id field together with the _source, so we can easily put the entity back together, but we never considered this a necessary feature to have.

Note:

When you look at the data in your ES index you will find that with the MappingElasticsearchConverter an additional _source field named _class is written which contains the name of the entity class (or a defined alias). This allows for mapping generics; for further info check the documentation - just in case you wonder where this comes from.