I tried running a Dataflow pipeline to read from Local machine(windows) and write to Google cloud storage using a DirectPipelineRunner. The job failed with the error below specifying FileNotFoundException(so i believe the dataflow job is unable to read my location). I am running the job from my local machine to run the GCP based template that i created. I am able to see it in the GCP Dataflow dashboard, but fails with the below error. Please help. I also tried IP or hostname of my local machine along with my local location, but faced this FileNotFoundException?
Error:
java.io.FileNotFoundException: No files matched spec: C:/data/sampleinput.txt
at org.apache.beam.sdk.io.FileSystems.maybeAdjustEmptyMatchResult(FileSystems.java:172)
at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:158)
at org.apache.beam.sdk.io.FileBasedSource.split(FileBasedSource.java:261)
at com.google.cloud.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:275)
COMMAND TO RUN THE TEMPLATE:
gcloud dataflow jobs run jobname --gcs-location gs://<somebucketname of template>/<templatename> --parameters inputFilePattern=C:/data/sampleinput.txt,outputLocation=gs://<bucketname>/output/outputfile,runner=DirectPipelineRunner
CODE:
PCollection<String> textData =pipeline.apply("Read Text Data", TextIO.read().from(options.getInputFilePattern()));
textData.apply("Write Text Data",TextIO.write().to(options.getOutputLocation()));