I have two tables like:
ID Name Age
1 Alex 20
2 Sarah 21 and so on
.....................
ID Name Marks
1 Alex 80
2 Sarah 78 and so on
.....................
I want to join these two tables using Cloud Dataflow (Apache Beam) on more than one key (joining condition) i. e. ID and Name both are the common columns. How can I do so?
I have tried joining it using one key (one common column) but I don't know how to use more than one key
I have used this code as a reference:
class JoinTables:
def add_key_details(self, row, key_details):
result = row.copy()
try:
result.update(key_details[row['name']])
except KeyError as err:
traceback.print_exc()
logging.error("Name Not Found error: %s", err)
return result
def run(argv=None):
jointables = JoinTables()
table1= (p
| 'Read table1 details from BigQuery ' >> beam.io.Read(
beam.io.BigQuerySource(
query='SELECT * FROM `dataset.table1`',
use_standard_sql=True
)
)
| 'Key Details 1' >> beam.Map(lambda row: (row['name'], row))
)
table2 = (p
| 'Read table2 details from BigQuery ' >> beam.io.Read(
beam.io.BigQuerySource(
query='SELECT * FROM `dataset.table2`',
use_standard_sql=True
)
)
| 'Join data with side input 1' >> beam.Map(jointables.add_key_details, AsDict(table1))
)