3
votes

To my understanding the Google Cloud Datastore allows me to write new entries without any time limits but puts limits on how often I can update an entity. In addition indexes are not strongly consistent.

I am quickly writing new sensor data associated with a single weather station into the datastore. The entity also contains a timestamp. There is a index sorting sensor readings by weather station and timestamp.

The goal now is to always return the most recent value to the user requesting the current value for a specific weather station but as the index is only eventually consistent it can happen that the returned value is not the most recent one.

Any ideas how an architecture could look like on the Google App Engine which always returns the most recent value without the risk to hit the write limit on a single entity?

2
The write limit for an entity group is around 1/sec and the consistency is typically achieved in seconds. How often do you read and need to update the sensor data from one station?Dan Cornilescu

2 Answers

3
votes

An alternative to writing all data from one station in a single entity group and using ancestor queries would be to write sensor readings as separate new entities and re-write a (small) well-known entity which contains the key of the most recent readings entity.

To get the most recent measurement you just get its key from the well-known entity and then get the entity by key lookup - always consistent.

You'd still be limited to writing samples not faster than once per second (on average), but at least this approach:

  • uses no ancestry and thus can avoid the large per-station entity groups you currently have and which can cause contention, see Keep entity groups small
  • uses no datastore queries
  • needs no indexes, thus avoiding the hot-spot problem you currently have indexing the monotonically increasing timestamp property (see High read/write rates to a narrow key range)
  • is not impacted by the size of the readings sample - only the small, fixed-size well-known entity is re-written

If you really need to write more than 1 sensor reading per second you could try to either:

  • use the sharding strategy with multiple well-known entities (up to 25 - that's the max number of entity groups that can be accessed in a cross-group transaction) containing keys of the most recently readings written to the datastore. You'd have to read all of them in a transaction and pick the one with the most recent timestamp
  • use memcache instead of the well-known entities - easily re-written since memcache tolerates much higher write rates. But you need to accept the possibility that once in a while memcache may fail and you'll have to resort to some query-based fallback story to recover, during which you might be returning some not really latest readings (or maybe just keep returning errors during these periods would be acceptable?)
0
votes

Try reading:

https://cloud.google.com/datastore/docs/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/

Basically, use an ancestor query and then your queries will be strongly consistent -- you will be able to query the most recent update.

Google Cloud Datastore supports one write per second per entity group. So long as each individual weather station writes less than once per second to its entity group you will be fine.