With the standard dataproc image 1.5 (Debian 10, Hadoop 2.10, Spark 2.4), a dataproc cluster cannot be created. Region is set to europe-west-2
.
The stack-driver log says:
"Failed to initialize node <name of cluster>-m: Component hdfs failed to activate See output in: gs://.../dataproc-startup-script_output"
Scanning through the output (gs://.../dataproc-startup-script_output), I can see the hdfs activation has failed:
Aug 18 13:21:59 activate-component-hdfs[2799]: + exit_code=1
Aug 18 13:21:59 activate-component-hdfs[2799]: + [[ 1 -ne 0 ]]
Aug 18 13:21:59 activate-component-hdfs[2799]: + echo 1
Aug 18 13:21:59 activate-component-hdfs[2799]: + log_and_fail hdfs 'Component hdfs failed to activate' 1
Aug 18 13:21:59 activate-component-hdfs[2799]: + local component=hdfs
Aug 18 13:21:59 activate-component-hdfs[2799]: + local 'message=Component hdfs failed to activate'
Aug 18 13:21:59 activate-component-hdfs[2799]: + local error_code=1
Aug 18 13:21:59 activate-component-hdfs[2799]: + local client_error_indicator=
Aug 18 13:21:59 activate-component-hdfs[2799]: + [[ 1 -eq 2 ]]
Aug 18 13:21:59 activate-component-hdfs[2799]: + echo 'StructuredError{hdfs, Component hdfs failed to activate}'
Aug 18 13:21:59 activate-component-hdfs[2799]: StructuredError{hdfs, Component hdfs failed to activate}
Aug 18 13:21:59 activate-component-hdfs[2799]: + exit 1
What am I missing?
EDIT
As @Dagang suggested, I ssh-ed into the master node and ran grep "activate-component-hdfs" /var/log/dataproc-startup-script.log
. The output is here.
n1-standard-4
for the master and the 2 workers. 500GB standard persistent disks for all the nodes. No custom initialization. The default image is version 1.3 but I want to use version 1.5. I've tried a handful of times but all of them failed with the same error. – tak/var/log/dataproc-startup-script.log
. – Dagang