4
votes

Is it possible to run code located in Google Cloud Datalab on Dataproc clusters?

The idea is to use the great interactivity and interface by Datalab to run Apache Spark code.

2
Were you able to get an answer to this or is datalab = dataproc+jupyter notebookmobcdi

2 Answers

3
votes

This is on our radar but not yet fully enabled as an init action for a Dataproc cluster.

Thanks. Dinesh Kulkarni Product Manager, Datalab & Machine Learning, GCP

1
votes

Now it is possible, just create a dataproc cluster using this command:

gcloud dataproc clusters create $CLUSTERNAME \
    --project $PROJECT \
    --num-workers $WORKERS \
    --bucket $BUCKET \
    --metadata startup-script-url=gs://$BUCKET/setup/setup_env.sh,BUCKET=$BUCKET \
    --master-machine-type $VMMASTER \
    --worker-machine-type $VMWORKER \
    --initialization-actions \
         gs://dataproc-initialization-actions/datalab/datalab.sh \
    --scopes cloud-platform

To make it even easier you can use this script: https://github.com/kanjih-ciandt/script-dataproc-datalab/tree/master