I have some questions related to Cloud Composer and BigQuery. We need to import and create an automated process to export tables from BigQuery to Storage. I have 4 options at the moment:
- bigquery_to_gcs Operator
- BashOperator: Executing the "bq" command provided by the Cloud SDK on Cloud Composer.
- Python Function: Create a Python function using the BigQuery API, almost the same as bigquery_to_gcs and execute this function with Airflow.
- DataFlow: The job will be executed with Airflow too.
I have some thoughts related to the first 3 options thought. If the table is huge, is there a chance to consume a big part of the resources of Cloud Composer? I've been searching if the bashoperator and bigquery operator consumes some resources of Cloud Composer. Always thinking that this process is going to be in production in the future and more dags are running at the same time. If that’s true, Dataflow will be a more convenient option?
A good approach of dataflow is that we can export the table in just one file if we want, that's not possible with the other options if the table is more than 1GB.