3
votes

Consider the below datastore entity:

public class Employee {
    @Id String id;
    @Index String userName
}

My understanding is that only those properties which are part of the filter criteria in the queries need to be annotated with @Index. Indexing in datastore is not for performance but for fetching the data.

  1. Should id also be annotated with @Index to query by id? If no, does datastore automatically create indexes for keys?
  2. @Id annotation makes sure to manage uniqueness, but it has no performance advantage over indexed properties. Is that right?
  3. Will query by id be faster than query by userName in the above example?
3

3 Answers

4
votes

1:

No, you don't need to explicitly index it. Datastore uses your key as a primary key for your entities (in the Entities table).

2 & 3:

Querying by primary key is more efficient (you only require a single scan on the primary table instead of a scan on the index followed by a lookup in the primary table. However, it also allows you to do a Lookup instead of a query:

Employee e = ofy().load().type(Employee.class).id("<id>").now();

Besides avoiding the query planning and index scan to lookup this Employee, this is Strongly Consistent. If you don't do this, you may write a new Employee but then not actually see them when you query for them.

While Strong Consistency is important from an application correctness point-of-view, it will be slower. In particular, when you do a strongly consistent lookup, Datastore may need to talk to the other replicas (in other data centers) to catch up your entity group.

If you are ok with eventual consistency, you can perform a Lookup with eventual consistency to avoid the index scans and the replica catch up using a read policy. In objectify, this looks like:

Employee e = ofy().consistency(Consistency.EVENTUAL).load()
    .type(Employee.class).id("<id>).now();

Note: This answer talks a lot about indexes and tables. In generally I recommend not thinking about Datastore in terms of indexes and table (since it is not a relational storage system). However, it is implemented on a relational DB, so useful for answering your questions. This page has a lot of good background.

2
votes
  1. No, will be created automatically
  2. @Id makes sure it's Key
  3. Can't find confirmation, but must be faster. Also it's cheaper than query, 1 read for get vs 2 read for query. See https://cloud.google.com/datastore/docs/pricing

Also, keep in mind that if you decide to add @Index annotation later, then it will be created only for new entities, all existing entities will be unindexed. Which means you need to reindex db, or only new records will be returned from Query with a filter by this field.

0
votes

Objectify always does a get by key - if you run a query, it does a keys only query, then fetches results by id. This works well because it has cache integration and it also means that you get accurate results (as in the data is strongly consistent, even though they query results aren't). You can control this using the .hybrid(boolean) method on a query.

You cannot query by id - you can only get by key. If you want to do that, you need a duplicate indexed field, and to query on that. This is an artifact of how keys work in the datastore.