0
votes

I am investigating what might be the best infrastructure for storing log files from many clients.

Google App engine offers a nice solution that doesn't make the process a IT nightmare: Load balancing, sharding, server, user authentication - all in once place with almost zero configuration.

However, I wonder if the Datastore model is the right for storing logs. Each log entry should be saved as a single document, where each clients uploads its document on a daily basis and can consists of 100K of log entries each day.

Plus, there are some limitation and questions that can break the requirements:

  1. 60 seconds timeout on bulk transaction - How many log entries per second will I be able to insert? If 100K won't fit into the 60 seconds frame - this will affect the design and the work that needs to be put into the server.
  2. 5 inserts per entity per seconds - Is a transaction considered a single insert?
  3. Post analysis - text search, searching for similar log entries cross clients. How flexible and efficient is Datastore with these queries?
  4. Real time data fetch - getting all the recent log entries.

The other option is to deploy an elasticsearch cluster on goole compute and write the server on our own which fetches data from ES.

Thanks!

2
2. The write limit is per Entity Group. 3. - this is where you will run into problems with Datastore. BigQuery on the other hand supports bulk or streaming inserts as well as an SQL-like query interface with regular expression support — all still with minimal configuration.tx802
I think Google BigQuery is better to use for storing and processing logsIgor Artamonov
Entity groups performance is not relevant here.Zig Mandel

2 Answers

0
votes

Bad idea to use datastore and even worse if you use entity groups with parent/child as a comment mentions when comparing performance. Those numbers do not apply but datastore is not at all designed for what you want. bigquery is what you want. its designed for this specially if you later want to analyze the logs in a sql-like fashion. Any more detail requires that you ask a specific question as it seems you havent read much about either service.

0
votes

I do not agree, Data Store is a totally fully managed no sql document store database, you can store the logs you want in this type of storage and you can query directly in datastore, the benefits of using this instead of BigQuery is the schemaless part, in BigQuery you have to define the schema before inserting the logs, this is not necessary if you use DataStore, think of DataStore as a MongoDB log analysis use case in Google Cloud.