0
votes

I have a Java flavored Google App Engine app where I need to go through many cycles of reading and then writing data to datastore. Each cycle is dependent on being able to read the latest data from the last writes.

Reading the Google documentation, it seems the way to guarantee this behavior is tying created entities to a common parent, i.e. new Entity(entity, parentKey).

The question here is does writing entities with the same ancestor (parent) entity really guarantee consistency? It seems that the parent entity would have the same issue as the children - That multiple instances could exist at different datastores. Clarity on this is greatly appreciated.

2

2 Answers

0
votes

Writing entities with the same ancestor (parent) entity DOES really guarantee consistency.

How?

You were wondering how this can be: "It seems that the parent entity would have the same issue as the children - That multiple instances could exist at different datastores."

The answer is that there is a very sophisticated algorithm (google for Megastore papers, and the PAXOS algorithm, if you are interested in the technical details) that implements ACID transactions on entity groups even if the entities are on different machines!

Why restrict ACID to entity groups?

You may be wondering then why they don't do this for the entire datastore. It seems like they figured out how to do ACID transactions on entities that are distributed, then why not implement it regardless of entity grouping?

The answer to that is that the ACID transactions comes at a cost, and the cost is this: You can only have up to 5 writes to an entity group per second (actually it is more like 5 write transactions per second, so you can batch write in the same transactions to get better write throughput). So if they did this for the whole database, it be practically useless for the internet scale they are aiming for.

Reading after writing gotcha:

As a side note, you have to be careful if you read an entity after it has been modified (within the same transaction). The semantics of datastore transactions are such that reads see the data as it appears at the start of the transaction. This means it won't see the writes that have occurred inside the transaction.

Recommending reading (well, viewing)

This is almost a required video to watch if you are using the datastore. It will clear up a lot of these issues for you: Google I/O 2011: More 9s Please: Under The Covers of the High Replication Datastore

-1
votes

it seems the way to guarantee this behavior is tying created entities to a common parent, i.e. new Entity(entity, parentKey).

Correct, this makes the child entities belong to the same entity group.

It seems that the parent entity would have the same issue as the children - That multiple instances could exist at different datastores.

This is also correct. Each root entity (i.e. an entity that does not have a parent itself) belongs to its own entity group.

My approach to this problem was to model the DB to have all data required on the current processing cycle share the same entity group and read/write to it on a transaction, as to still be able to maintain concurrency on some level (something like "processing different users on different instances", where each user has all its data on the same group).

If you need to process all entities of a kind at once, like in a snapshot, I'm afraid AppEngine's Datastore is not really the best option. You could have all your entities be childs of a hardcoded root one like this:

Key root = KeyFactory.createKey(kind, "root");
new Entity(entity, root);

It works, but it brings a whole lot of problems - the main one is that it slows down your app since you can't perform many writes at once. You can read more about it here.