0
votes

I am using a DataFlow pipeline to stream data from iot devices (via PubSub subscription) into BigQuery. I'm looking for a way to dynamically direct data from a given device into a BigQuery dataset that I can specify on the fly. Here's a typical situation: a user decides "I want to start streaming data from iot device A into BigQuery Dataset 5," then later decides, "Now I want to start streaming data from iot device A into BigQuery Dataset 7."

I'm looking a way to do this without updating the device configuration or restarting the dataflow job. Is this possible? If not, what's the best way to do this?

1

1 Answers

2
votes

It should be possible by using DynamicDestinations. This enables you to decide per element where it should be stored. If you can't directly deduce the BigQuery dataset based on the element you want to store, you can either join it with the device configuration or create some lookup mechanism in your DynamicDestinations implementation.

Please take a look at the docs, where an example is also provided. https://beam.apache.org/releases/javadoc/2.13.0/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinations.html