I'm trying to install Jupyter notebook / Datalab on my Dataproc cluster but with no avail.
I follow this tutorial: https://cloud.google.com/dataproc/docs/tutorials/dataproc-datalab
Step by step:
- I create a new GS Bucket called
datalab-init-bucket-001
and upload there thedatalab.sh
script from GitHub https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/blob/master/datalab/datalab.sh Then start the Dataproc via
gcloud
command with--initialization-actions 'gs://datalab-init-bucket-001/datalab.sh'
, the entire command looks like:gcloud dataproc create cluster-test --subnet default --zone "" --master-machine-type n1-standard-4 --master-boot-disk-size 10 --num-workers 2 --worker-machine-type n1-standard-2 --worker-boot-disk-size 10 --initialization-action-timeout "10h" --initialization-actions 'gs://datalab-init-bucket-001/datalab.sh'
Here, the first problem arises:
Looking at the logs:
OK > Downloading script [gs://datalab-init-bucket-001/datalab.sh] to [/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0]
OK > Running script [/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0] and saving output in [/var/log/dataproc-initialization-script-0.log]
OK > DIR* completeFile: /user/spark/eventlog/.cc2b1d00-4968-4008-87d7-eac090b09e56 is closed by DFSClient_NONMAPREDUCE_1150019196_1
ERROR > AgentRunner startup failed: com.google.cloud.hadoop.services.agent.AgentException: Initialization action failed to start (error=2, No such file or directory). Failed action 'gs://datalab-init-bucket-001/datalab.sh' (TASK_FAILED) at com.google.cloud.hadoop.services.agent.AgentException$Builder.build(AgentException.java:83) at com.google.cloud.hadoop.services.agent.AgentException$Builder.buildAndThrow(AgentException.java:79) at com.google.cloud.hadoop.services.agent.BootstrapActionRunner.throwInitActionFailureException(BootstrapActionRunner.java:236) at com.google.cloud.hadoop.services.agent.BootstrapActionRunner.runSingleCustomInitializationScriptWithTimeout(BootstrapActionRunner.java:146) at com.google.cloud.hadoop.services.agent.BootstrapActionRunner.runCustomInitializationActions(BootstrapActionRunner.java:126) at com.google.cloud.hadoop.services.agent.AbstractAgentRunner.runCustomInitializationActionsIfFirstRun(AbstractAgentRunner.java:150) at com.google.cloud.hadoop.services.agent.MasterAgentRunner.initialize(MasterAgentRunner.java:165) at com.google.cloud.hadoop.services.agent.AbstractAgentRunner.start(AbstractAgentRunner.java:68) at com.google.cloud.hadoop.services.agent.MasterAgentRunner.start(MasterAgentRunner.java:36) at com.google.cloud.hadoop.services.agent.AgentMain.lambda$boot$0(AgentMain.java:63) at com.google.cloud.hadoop.services.agent.AgentStatusReporter.runWith(AgentStatusReporter.java:52) at com.google.cloud.hadoop.services.agent.AgentMain.boot(AgentMain.java:59) at com.google.cloud.hadoop.services.agent.AgentMain.main(AgentMain.java:46) Caused by: java.io.IOException: Cannot run program "/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at com.google.cloud.hadoop.services.agent.util.NativeAsyncProcessWrapperFactory.startAndWrap(NativeAsyncProcessWrapperFactory.java:33) at com.google.cloud.hadoop.services.agent.util.NativeAsyncProcessWrapperFactory.startAndWrap(NativeAsyncProcessWrapperFactory.java:27) at com.google.cloud.hadoop.services.agent.BootstrapActionRunner.createRunner(BootstrapActionRunner.java:349) at com.google.cloud.hadoop.services.agent.BootstrapActionRunner.runScriptAndPipeOutputToGcs(BootstrapActionRunner.java:301) at com.google.cloud.hadoop.services.agent.BootstrapActionRunner.runSingleCustomInitializationScriptWithTimeout(BootstrapActionRunner.java:142) ... 9 more Suppressed: java.io.IOException: Cannot run program "/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0": error=2, No such file or directory ... 15 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 14 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 14 more undefinedE AgentRunner startup failed:
I somehow managed to start Datalab on single-node cluster. But I was not able to start the (py)Spark session there.
I run the latest Dataproc image version (1.2), but for example 1.1 also didn't work. I have free credits account, but I guess this should not pose a problem.
Any idea how to update the datalab.sh
script to make this work?
--scopes 'https://www.googleapis.com/auth/cloud-platform'
flag? – tix