I am using Apache Beam on Python and would like to ask what is the equivalent of Apache Beam Java Wait.on()
on python SDK?
currently I am having problem with this code snippet below
if len(output_pcoll) > 1:
merged = (tuple(output_pcoll) |
'MergePCollections1' >> beam.Flatten())
else:
merged = output_pcoll[0]
outlier_side_input = self.construct_outlier_side_input(merged)
(merged |
"RemoveOutlier" >>
beam.ParDo(utils.Remove_Outliers(),
beam.pvalue.AsDict(outlier_side_input)) |
"WriteToCSV" >>
beam.io.WriteToText('../../ML-DATA/{0}.{1}'.format(self.BUCKET,
self.OUTPUT), num_shards=1))
it seems Apache Beam does not wait until the code on self.construct_outlier_side_input
finished executing and result in empty side input when executing "RemoveOutlier" in the next pipeline. In Java version you can use Wait.On()
to wait for construct_outlier_side_input
to finish executing, however I could not find the equivalent method in the Python SDK.
--Edit-- what i am trying to achieve is almost the same as in this link, https://rmannibucau.metawerx.net/post/apache-beam-initialization-destruction-task