After a few days testing and developing some Google Dataflow templates with python, I'm questioning myself about if using BigQuery is appropriate for our project.
Why this thoughts?
BigQuery focuses on Online Analytical Processing (OLAP) so treat BigQuery like an Online Transaction Processing (OLTP) it might not be the best option.
Even knowing that, I decided to use BQ, because it has better compatibility with Google Dataflow.
After develop a first version of a python dataflow's template and try with a few data this works well. Using streaming or batch execution of this pipeline I don't have any special issues.
The problems begin when I try to use BQ like OLTP. Here is where I found some issues due to my use case: In some case, I can not use bulk updates and I have to use single row mutations, so I have to use DML sentences to perform this task. The result is that the execution time begin to increase considerable with hundred of single row mutations.
- Even If I assume this increase of time, I found that in BQ you can only use 1,000 updates per table per day. (More quotes)
Alternatives
Here is where I'm not sure about which other services should I use instead of BigQuery:
Cloud SQL is the OLTP database of Google. This could be the appropriate for my use case, but: is Cloud SQL work as fast as BQ if I try to analyze the data? I will have many tables with million of rows and, for example, working with Microsfot Sql Server is not sufficient for me due to the high processing times.
How Cloud SQL work with apache beam and pipelines?
Works with Cloud SQL and migrate data to BigQuery each day to avoid make single row mutations in BQ?
Which service could be the best to use in this case?