I'm developing a python program to use like Google Dataflow template.
What I'm doing is writing the data in BigQuery from PubSub:
pipeline_options.view_as(StandardOptions).streaming = True
p = beam.Pipeline(options=pipeline_options)
(p
# This is the source of the pipeline.
| 'Read from PubSub' >> beam.io.ReadFromPubSub('projects/.../topics/...')
#<Transformation code if needed>
# Destination
| 'String To BigQuery Row' >> beam.Map(lambda s: dict(Trama=s))
| 'Write to BigQuery' >> beam.io.Write(
beam.io.BigQuerySink(
known_args.output,
schema='Trama:STRING',
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
))
)
p.run().wait_until_finish()
The code is running in local, not in Google Dataflow yet
This "works" but not the way i want, because currently the data are stored in the BigQuery Buffer Stream and I can not see it (even after waiting some time).
When are gonna be available in BigQuery? Why are stored in the buffer stream instead of the "normal" table?
Table
LIMIT 1000 even with a C# code instead of using the Google Cloud console but still are missing. I imagine that it may be due I running this pipeline locally and like is an streaming pipeline it doesn't finish, so to finish it I have to push the stop buttom. Here is where I can see the different between stop and drain it, but not sure if this could be the problem - IoT user