1
votes

I'm trying to run PubSubToBigQuery.java locally using https://github.com/GoogleCloudPlatform/DataflowTemplates with direct runner. However I'm getting the error message

Exception in thread "main" java.lang.IllegalArgumentException: Class interface com.google.cloud.teleport.templates.PubSubToBigQuery$Options missing a property named 'gcs-location'.
    at org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1518)
    at org.apache.beam.sdk.options.PipelineOptionsFactory.access$400(PipelineOptionsFactory.java:111)
    at org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.as(PipelineOptionsFactory.java:294)
    at com.google.cloud.teleport.templates.PubSubToBigQuery.main(PubSubToBigQuery.java:165)

But I've already passed --gcs-location=gs://xxx-templates/dataflow/pipelines/pubsub-to-bigquery during run.

It is at this line the error is thrown. https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java#L176

https://beam.apache.org/documentation/runners/direct/

1

1 Answers

1
votes

You're confusing the args passed to the Java application with the args passed to run the templated pipeline via the CLI.

--gcs-location is what you pass to gcloud dataflow jobs run on the CLI. When you run the Java app, Dataflow stages your pipeline on GCS (the template), but doesn't run the pipeline immediately. --gcs-location is telling gcloud dataflow.. the location of the template to run.

You can't execute a templated pipeline locally. You just run the staging of the template locally via the Java app.

https://cloud.google.com/dataflow/docs/guides/templates/executing-templates

 * # Set the runner
 * RUNNER=DataflowRunner
 *
 * # Build the template <--NOTE THIS
 * mvn compile exec:java \
 * -Dexec.mainClass=com.google.cloud.teleport.templates.PubSubToBigQuery \
 * -Dexec.cleanupDaemonThreads=false \
 * -Dexec.args=" \
 * --project=${PROJECT_ID} \
 * --stagingLocation=${PIPELINE_FOLDER}/staging \
 * --tempLocation=${PIPELINE_FOLDER}/temp \
 * --templateLocation=${PIPELINE_FOLDER}/template \
 * --runner=${RUNNER}"
 *
 * # Execute the template <--NOTE THIS
 * JOB_NAME=pubsub-to-bigquery-$USER-`date +"%Y%m%d-%H%M%S%z"`
 *
 * gcloud dataflow jobs run ${JOB_NAME} \
 * --gcs-location=${PIPELINE_FOLDER}/template \
 * --zone=us-east1-d \
 * --parameters \
 * "inputTopic=projects/data-analytics-pocs/topics/teleport-pubsub-to-bigquery,\
 * outputTableSpec=data-analytics-pocs:demo.pubsub_to_bigquery,\
 * outputDeadletterTable=data-analytics-pocs:demo.pubsub_to_bigquery_deadletter"
 * </pre>
 */