Is it possible to run code located in Google Cloud Datalab on Dataproc clusters?
The idea is to use the great interactivity and interface by Datalab to run Apache Spark code.
Now it is possible, just create a dataproc cluster using this command:
gcloud dataproc clusters create $CLUSTERNAME \
--project $PROJECT \
--num-workers $WORKERS \
--bucket $BUCKET \
--metadata startup-script-url=gs://$BUCKET/setup/setup_env.sh,BUCKET=$BUCKET \
--master-machine-type $VMMASTER \
--worker-machine-type $VMWORKER \
--initialization-actions \
gs://dataproc-initialization-actions/datalab/datalab.sh \
--scopes cloud-platform
To make it even easier you can use this script: https://github.com/kanjih-ciandt/script-dataproc-datalab/tree/master