I need to read in an avro file from local or gcs, via java. I followed the example from docs from https://beam.apache.org/documentation/sdks/javadoc/2.0.0/index.html?org/apache/beam/sdk/io/AvroIO.html
Pipeline p = ...;
// A Read from a GCS file (runs locally and using remote execution):
Schema schema = new Schema.Parser().parse(new File("schema.avsc"));
PCollection<GenericRecord> records =
p.apply(AvroIO.readGenericRecords(schema)
.from("gs://my_bucket/path/to/records-*.avro"));
But when I try to process it through a DoFn there doesnt appear to be any data there. The avro file does have data and was able to run a function to generate a schema from it. If anybody has advice please share.
DoFn
is doing? Can you post any more relevant code? Maybe post the full pipeline implementation. In the Dataflow UI, do you see the input element count remain at zero? – Andrew Nguonly