I am creating dataproc cluster on GCP using a workflow template from YAML files. Once the cluster is created all the steps start executing in parallel but I want some steps to execute after all other steps have completed execution. is there any way to achieve this?
sample YAML used for cluster creation
jobs:
- pigJob:
continueOnFailure: true
queryList:
queries:
- sh /ui.sh
stepId: run-pig-ui
- pigJob:
continueOnFailure: true
queryList:
queries:
- sh /hotel.sh
stepId: run-pig-hotel
placement:
managedCluster:
clusterName: cluster-abc
labels:
data: cluster
config:
configBucket: bucket-1
initializationActions:
- executableFile: gs://bucket-1/install_git.sh
executionTimeout: 600s
gceClusterConfig:
zoneUri: asia-south1-a
tags:
- test
masterConfig:
machineTypeUri: n1-standard-8
diskConfig:
bootDiskSizeGb: 50
workerConfig:
machineTypeUri: n1-highcpu-32
numInstances: 2
diskConfig:
bootDiskSizeGb: 100
softwareConfig:
imageVersion: 1.4-ubuntu18
properties:
core:io.compression.codec.lzo.class: com.hadoop.compression.lzo.LzoCodec
core:io.compression.codecs: org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec
secondaryWorkerConfig:
numInstances: 2
isPreemptible: true
the command used to create the cluster
gcloud dataproc workflow-templates instantiate-from-file --file file_name.yaml
gcloud version: 261.0.0