I am investigating what might be the best infrastructure for storing log files from many clients.
Google App engine offers a nice solution that doesn't make the process a IT nightmare: Load balancing, sharding, server, user authentication - all in once place with almost zero configuration.
However, I wonder if the Datastore model is the right for storing logs. Each log entry should be saved as a single document, where each clients uploads its document on a daily basis and can consists of 100K of log entries each day.
Plus, there are some limitation and questions that can break the requirements:
- 60 seconds timeout on bulk transaction - How many log entries per second will I be able to insert? If 100K won't fit into the 60 seconds frame - this will affect the design and the work that needs to be put into the server.
- 5 inserts per entity per seconds - Is a transaction considered a single insert?
- Post analysis - text search, searching for similar log entries cross clients. How flexible and efficient is Datastore with these queries?
- Real time data fetch - getting all the recent log entries.
The other option is to deploy an elasticsearch
cluster on goole compute and write the server on our own which fetches data from ES.
Thanks!