We use 50-100 bigtable nodes (depending on the amount of data we process, this number varies between 50 and 100 throughout the day).
Every day we have a dataflow job which scans the entire bigtable (of one specific column family), and dumps the scanned data to GCS.
We have been experimenting with the number of worker nodes vs. the speed of scanning the entire table. Typically, the job scans ~100M rows (there are many more rows in the table, but we set the range to be a 24-hour window) and the size of the scanned data is roughly ~1TiB.
While the number of bigtable nodes is fixed (say, at 80), we varied the number of Dataflow worker nodes (n1-standard-1) from 15 to 50 in an incremental manner, and the speed of scanning did not seem to scale linearly. Similarly, when we keep the number of dataflow workers (at 50) fixed and varied the number of bt nodes (between 40 and 80), the read throughput did not seem to vary much (as long as there are "enough" btnodes). If this is the case, what other options do we have for a faster scanning job? One idea we have is to run several scanning jobs where each job scans a subset of contiguous rows, but we wish to avoid this approach.
Any help will be really appreciated!