I've got 2 tables in BigQuery tables that are aggregated and processed which may have up to 2 million and 10 million rows respectively.
They have very different columns but each as the same primary key (IDXX). In table 1 there is one row for each IDXX and in table 2 there maybe up to 10 rows with IDXX.
I'd like to export these two tables from BigQuery in matching chunks. So for example:
- table1_chunk1.csv: Should have IDXX: 1 - 10 (10 rows)
table2_chunk1.csv: Should have IDXX: 1 - 10 (could be up to 100 rows)
table1_chunk2.csv: Should have IDXX: 11 - 20 (10 rows)
- table2_chunk2.csv: Should have IDXX: 11 - 20 (could be up to 100 rows)
What would be the best way to do this? Use cloud Dataflow? Do it in Bash?
1 - 10 (could be up to 100 rows)
mean? - please clarify. – Mikhail Berlyant