I try to rename bigquery rows in an Apache Beam Pipeline in Python like in the following example : Having 1 PCollection with the full data and 1 other with only 3 fields renamed col1 in col1.2, col2 in col2.2...
How can I apply my filter correctly to get that second PCollection with renamed rows ?
def is_filtered(row):
row['col1'] == row['col1.2']
row['col2'] == row['col2.2']
row['col3'] == row['col3.2']
yield row
with beam.Pipeline() as pipeline:
query = open('query.sql', 'r')
bq_source = beam.io.BigQuerySource(query=query.read(),
use_standard_sql=True)
main_table = \
pipeline \
| 'ReadBQData' >> beam.io.Read(bq_source) \
cycle_table = (
pipeline
| 'FilterMainTable' >> beam.Filter(is_filtered, main_table))
I also thaught about using Partition but the Partition examples I found were more about partitioning the content of the rows and not the row itself