3
votes

I have the following entities:

@Entity
public class Person {
  @Id public Long id;
  public String name;
  public Ref<Picture> picture;
  public String email;
  public byte age;
  public short birthday; // day of year
  public String school;
  public String very_long_life_story;
  ... some extra fields ...
}

@Entity
public class Place {
  @Id public Long id;
  public String name;
  public String comment;
  public long createdDateMS;
  public long visitors;

  @Load public List<Ref<Person>> owners;
}

Few notes: (A) Maximum size of owners, in Place entity, is 4 (~) (B) The person class is presumable very big, and when querying place, I would like to only show a subset of the person data. This optimizations is aimed both at server-client and server-database communications. Since objectify (gae actually) only load/save entire entities, I would like to do the following:

@Entity
pubilc class PersonShort {
  @Id public Long id;
  public Ref<Picture> picture;
  public String name;
}

and inside Place, I would like to have (instead of owners):

@Load public List<PersonShort> owners;

(C) The problem with this approach, is that now I have a duplication inside the datastore. Although this isn't such a bad thing, the real problem is when a Person will try to save a new picture, or change name; I will not only have to update it in his Person class, but also search for every Place that has a PersonShort with same id, and update that.

(D) So the question is, is there any solution? Or am I simply forced to select between the options? (1) Loading multiple Person class, which are big, when all I need is some really small information about it. (2) Data duplication with many writes

If so, which one is better (Currently, I believe it's 1)?

EDIT What about loading the entire class (1), but sending only part of it?

@Entity
public class Person {
  @Id public Long id;
  public String name;
  public Ref<Picture> picture;
  public String email;
  public byte age;
  public short birthday; // day of year
  public String school;
  public String very_long_life_story;
  ... some extra fields ...
}

public class PersonShort {
  public long id;
  public String name;
}

@Entity
public class Place {
  @Id public Long id;
  public String name;
  public String comment;
  public long createdDateMS;
  public long visitors;

  // Ignore saving to datastore
  @Ignore
  public List<PersonShort> owners;

  // Do not serialize when sending to client
  @ApiResourceProperty(ignored = AnnotationBoolean.TRUE)
  @Load public List<Ref<Person>> ownersRef;

  @OnLoad private void loadOwners() {
    owners = new List<PersonShort>();
    for (Ref<Person> r : ownersRef) {
      owners.add(nwe PersonShort(r.get()));
    }
  }
}
3

3 Answers

3
votes

It sounds like you are optimizing prematurely. Do you know you have a performance issue?

Unless you're talking about hundreds of K, don't worry about the size of your Person object in advance. There is no practical value in hiding a few extra fields unless the size is severe - and in that case, you should extract the big fields into some sort of meaningful entity (PersonPicture or whatnot).

1
votes

No definite answer, but some suggestions to look at:

  1. Lifecycle callbacks. When you put your Person entity, you can have an @OnSave handler to automatically store your new PersonShort entity. This has the advantage of being transparent to the caller, but obviously you are still dealing with 2 entity writes instead of 1.

    You may also find you are having to fetch two entities too; initially you may fetch the PersonShort and then later need some of the detail in the corresponding Person. Remember Objectify's caching can reduce your trips to Datastore: it's arguably better to have a bigger, cached, entity than two separate entities (meaning two RPCs).

  2. Store your core properties (the ones in PersonShort) as separate properties in your Person class and then have the extended properties as a single JSON string which you can deserialize with Gson.

    This has the advantage that you are not duplicating properties, but the disadvantage is that anything you want to be able to search on cannot be in the JSON blob.

  3. Projection Queries. You can tell Datastore to return only certain properties from your entities. The problem with this method is that you can only return indexed properties, so you will probably find you need too many indexes for this to be viable.

Also, use @Load annotations with care. For example, in your Place class, think whether you really need all those owners' Person details when you fetch the owners. Perhaps you only need one of them? i.e., instead of getting a Place and 4 Persons every time you fetch a Place, maybe you are better off just loading the required Person(s) when you need them? (It will depend on your application.)

0
votes

It is a good practice to return a different entity to your client than the one you get from your database. So you could create a ShortPerson or something that is only used as a return object in your REST endpoints. It will accept a Person in its constructor and fill in the properties you want to return to the client from this more complete object.

The advantage of this approach is actually less about optimization and more that your server models will change over time, but that change can be completely independent of your API. Also, you choose which data is publicly accessible, which is what you are trying to do.

As for the optimization between db and server, I wouldn't worry about it until it is an issue.