Join streaming data with a dynamic BigQuery table in Dataflow SQL

Question

I have a Dataflow SQL job that joins a streaming PubSub with a BigQuery table and writes the result to a BigQuery table. When I a add a new record (new sales_region) to my table 'us_state_salesregions' the new sales_region is NOT visible in the result table.

Only after creating a new Dataflow Job the newly added sales_region is visible in the result table of the query.

SELECT tr.*, sr.sales_region
FROM pubsub.topic.`project-id`.transactions as tr
  INNER JOIN bigquery.table.`project-id`.dataflow_sql_dataset.us_state_salesregions AS sr
  ON tr.state = sr.state_code

What should I do to get the newly added sales_region in the result of the query (without starting a new Dataflow Job)?

robertwb robertwb · Accepted Answer · 2021-05-11T18:25:03

Bounded source reads (such as BigQuery) are considered static and not re-read during the course of a streaming pipeline.

If your side table is small enough, you could set up looping timers that periodically re-reads your bigtable and join based on that.

Join streaming data with a dynamic BigQuery table in Dataflow SQL

1 Answers