5
votes

With the standard dataproc image 1.5 (Debian 10, Hadoop 2.10, Spark 2.4), a dataproc cluster cannot be created. Region is set to europe-west-2.

The stack-driver log says:

"Failed to initialize node <name of cluster>-m: Component hdfs failed to activate See output in: gs://.../dataproc-startup-script_output"

Scanning through the output (gs://.../dataproc-startup-script_output), I can see the hdfs activation has failed:

Aug 18 13:21:59 activate-component-hdfs[2799]: + exit_code=1
Aug 18 13:21:59 activate-component-hdfs[2799]: + [[ 1 -ne 0 ]]
Aug 18 13:21:59 activate-component-hdfs[2799]: + echo 1
Aug 18 13:21:59 activate-component-hdfs[2799]: + log_and_fail hdfs 'Component hdfs failed to activate' 1
Aug 18 13:21:59 activate-component-hdfs[2799]: + local component=hdfs
Aug 18 13:21:59 activate-component-hdfs[2799]: + local 'message=Component hdfs failed to activate'
Aug 18 13:21:59 activate-component-hdfs[2799]: + local error_code=1
Aug 18 13:21:59 activate-component-hdfs[2799]: + local client_error_indicator=
Aug 18 13:21:59 activate-component-hdfs[2799]: + [[ 1 -eq 2 ]]
Aug 18 13:21:59 activate-component-hdfs[2799]: + echo 'StructuredError{hdfs, Component hdfs failed to activate}'
Aug 18 13:21:59 activate-component-hdfs[2799]: StructuredError{hdfs, Component hdfs failed to activate}
Aug 18 13:21:59 activate-component-hdfs[2799]: + exit 1

What am I missing?

EDIT

As @Dagang suggested, I ssh-ed into the master node and ran grep "activate-component-hdfs" /var/log/dataproc-startup-script.log. The output is here.

1
A few questions: Does it happen in a consistent manner? What is the size of the cluster and which machines are you using? Are there any additional initialization actions you have added?David Rabinowitz
For this, I'm using all the default options, except the image. n1-standard-4 for the master and the 2 workers. 500GB standard persistent disks for all the nodes. No custom initialization. The default image is version 1.3 but I want to use version 1.5. I've tried a handful of times but all of them failed with the same error.tak
You should be able to find the failure reason in the log, just filter by "activate-component-hdfs". You can also ssh into the master node then run /var/log/dataproc-startup-script.log.Dagang
I tried, but couldn't reproduce the problem with 1.5.Dagang
Hi @tak, I'm afraid I was not able to reproduce this on a 1.5 cluster. Can you please add the log that Dagang had asked to the question?David Rabinowitz

1 Answers

3
votes

So the problem is there is an user name called "pete{" on which the hadoop fs -mkdir -p command failed. These kind of user names with special chars especially open parenthesis e,g,"()[]{}" will potentially fail the HDFS activation step during cluster creation.

So the easy solution is just to remove those accidentally created user.