1
votes

enter image description here Above reference architecture indicates the existence of Cloud Storage sink from Cloud Dataflow, however the Beam API which seems to be the current default Dataflow API has no Cloud Storage I/O connector listed.

enter image description here

Can anyone help clarify if there is one that exists, if not what is the alternative to bring data from Dataflow to Cloud Storage.

2

2 Answers

4
votes

Beam does support writing/reading from GCS. You simply use the TextIO classes.

https://beam.apache.org/documentation/sdks/javadoc/0.2.0-incubating/org/apache/beam/sdk/io/TextIO.html

To read a PCollection from one or more text files, use TextIO.Read. You can instantiate a transform using TextIO.Read.from(String) to specify the path of the file(s) to read from (e.g., a local filename or filename pattern if running locally, or a Google Cloud Storage filename or filename pattern of the form "gs:///").

1
votes

You can use TextIO, AvroIO or any other connector that reads from/writes to files to interact with GCS. Beam identifies any file path that starts with "gs://" to be for GCS. Beam does this using the pluggable FileSystem [1] interface.

[1] https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/storage/GcsFileSystem.java