For the sake of simplicity, let's say my app needs to allow thousands of users to see a real-time read-only stream of a chat room. The host can type messages, but no other users can—they just see what's being typed by the hosts, in real time. Imagine users are following a textual play-by-play of a sporting event.
Each user checks for new messages by polling once per second using a simple /get-recent-messages
call to the GAE server. (Before you ask, I believe using Google's Channels API would be far too expensive.)
Considering that this app is used by many thousands of users simultaneously, which means dozens to hundreds of GAE instances are running, how can I get these /get-recent-messages
calls to return the latest chat room messages with less than 1000 ms latency, while minimizing server load and GAE costs?
Some ideas I had:
- Store chat messages in datastore entities.
- Obviously this is way too slow and expensive, especially if using queries/indexes
- Store chat messages in memcache keys. I imagine I'd use one key to store the list of keys for the last 50 messages, and then 50 more keys, one for each message.
- This would cause huge bottlenecks because App Engine's memcache shards by key and thus all 500 instances would be constantly reading from the same memcache keys and thus the same memcache server.
- Store chat messages in instance memory, backed by memcache. And pull from memcache (like in #2) when instance memory is stale.
- This would probably result in an expensive race condition when multiple requests see stale instance memory cache and pull from memcache simultaneously.
- Use a background thread to update instance memory from memcache. This could run once per second per instance using a thread started in the warmup request. It would work like #3 but with only one thread pulling instead of random requests triggering memcache reads.
- I don't think this is how background threads work on App Engine. I also don't really know if this is an appropriate use of warmup requests.
- Use Google's Pub/Sub service.
- I have no idea how this works and it seems like it could be overkill for this use case.
- Run a once-per-second cron job to pull from memcache. This would be like #4 except without relying on background threads.
- I'd need this to run on every instance every second. I don't believe the cron/taskqueue API has a way to run a job or task on all active instances.
Thoughts?