1
votes

In my Apache Beam pipeline I have collection that may be empty (it contains errors from processing, there may be no errors).

I create file errors.csv with those erroneous items. I would like to skip creating that file if there are no errors. But currently Apache Beam creates file (with just header line) even when input PCollection with errors is empty.

my code

 TextIO.Write errorsWrite= TextIO.write()
    .withHeader(..)
    .to(..)

PCollection<ErrorItems> errors=...
errors.apply("write errors to file",errorsWrite)
1

1 Answers

1
votes

This doesn't seem to be documented anywhere except in a comment to a deprecated method which implies that creating an empty file for an empty PCollection is expected behavior.

The most reasonable workaround for this situation is to add additional code (maybe after receiving the PipelineResult) to read the errors.csv, see if there are any items in it other than the header file, and delete the file if not.