My project is running Python 2.7 (yes, I know...) with Apache Beam 2.19 on Google Dataflow. We're connecting to BigQuery in the same way that's specified in the Apache Beam tutorial:
p | 'Get data from BigQuery' >> beam.io.Read(beam.io.BigQuerySource(
query=get_query(limit),
use_standard_sql=True)))
However, the read-step of this Pipeline is incredibly slow - most likely due to the reading of .avro-files. It doesn't seem like fastavro is actually being used, though. AFAIK, you need to set the use_fastavro flag explicitly when running on Python <3.7. Is that even possible with this setup? Or will I need to, manually, export to GCS first?