I'm learning Windowing & Triggering concepts in Apache with the intention to:
- reads for unbounded sources (PubSub)
- write the incoming message to localhost disk every 5 sec FIXED Window interval
ISSUE: no output getting written to localhost disk (the pipeline did create a beam-team- folder and wrote some files in there but no output.csv in intended destination getting written every 5 sec.)
- running apache-beam==2.9.0, Python 2.7.10
- Tried both: DirectRunner, as well as DataFlowRunner (with GCS Bucket as destination)
Here's the code (thanks a lot in advance for any advice):
p = beam.Pipeline(runner=None, options=options, argv=None)
"""
#1) Read incoming messages & apply Windowing
"""
lines = p | "read_sub" >> beam.io.gcp.pubsub.ReadFromPubSub(topic=None, subscription=SUBSCRIBER, with_attributes=True) \
"""
#2) Apply 5 sec Windowing
"""
| 'window' >> beam.WindowInto(beam.window.FixedWindows(5))
"""
#3) apply Map() ops
"""
output = lines | "pardo" >> beam.Map(lambda x: x.data)
"""
#4) write out to localhost disk
"""
output | beam.io.WriteToText('output', file_name_suffix='.csv', header='time, colname1, colname2')
p.run().wait_until_finish()
thanks a lot in advance for any advice!
Cheers!