We're using Cloud Function to transform our datas in BigQuery : - all datas are in BigQuery - to transform data, we only use SQL queries in BigQuery - each query runs once a day - our biggest SQL query runs for about 2 to 3 minutes, but most queries runs for less than 30 seconds - we have about 50 queries executed once a day, and this number is increasing
We tried at first to do the same thing (SQL queries in BigQuery) with Dataflow, but : - it took about 10 to 15 minutes just to start dataflow - it is more complicated to code than our cloud functions - at that time, Dataflow SQL was not implemented
Every time we talk with someone using GCP (users, trainers or auditers), they recommend using Dataflow. So did we miss something "magic" with Dataflow, in our use case? Is there a way to make it start in seconds and not in minutes?
Also, if we use streaming in Dataflow, how are costs calculated? I understand that in batch we pay for what we use, but what if we use streaming? Is it counted as a full-time running service?
Thanks for your help