Unable to ssh to master node of Google Cloud Dataproc, but can ssh to Compute Engine VM

Question

I am having no trouble sshing into a Google Cloud compute engine VM, but am unable to ssh into the master node of a Google Cloud Dataproc cluster.

Specifically,

gcloud compute ssh my-vm

works just fine, while

gcloud compute ssh mycluster-m

fails with error message:

[email protected]: Permission denied (publickey).
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].

The compute engine VM and the Dataproc cluster are in the same project. I understand from the error message it is something related to the ssh keys, but I am not sure how to fix it - I checked the ssh keys in the project via cloud console, and it is correct, and tried the usual gcloud auth login to reset gcloud project login details.

Any hints on how to fix this?

Edits: I am trying to ssh from my machine, not the cloud console- that's a good point, I will try that and see if that is possible. But in the end I want to use this to connect to a Jupyter notebook from my local computer, so that does not solve the issue of being unable to SSH from my machine to the VM.

Concerning the command to create the Dataproc cluster, I use tools from the hail dataproc python library, but these are basically just convenience shells for the gcloud compute commands, and this is what is failing. But the command I used to create the Dataproc cluster was:

gcloud beta dataproc clusters create \
    test \
    --image-version=1.4-debian9 \
    --properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g \
    --initialization-actions=gs://hail-common/hailctl/dataproc/0.2.53/init_notebook.py \
    --metadata=^|||^WHEEL=gs://hail-common/hailctl/dataproc/0.2.53/hail-0.2.53-py3-none-any.whl|||PKGS=aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|dill>=0.3.1.1,<0.4|gcsfs==0.2.1|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests>=2.21.0,<2.21.1|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.* \
    --master-machine-type=n1-highmem-8 \
    --master-boot-disk-size=100GB \
    --num-master-local-ssds=0 \
    --num-preemptible-workers=0 \
    --num-worker-local-ssds=0 \
    --num-workers=2 \
    --preemptible-worker-boot-disk-size=40GB \
    --worker-boot-disk-size=40GB \
    --worker-machine-type=n1-standard-8 \
    --initialization-action-timeout=20m \
    --labels=creator=my_name \
    --max-idle=10m

sorry, I found other answers about connectivity to Google Cloud compute engine via ssh on Stack Overflow (e.g. stackoverflow.com/questions/42167596/…), let me know if there is a more appropriate place to put it — lmrta
May you share a command that you use to create Dataproc cluster? Did you specify any custom service account? — Igor Dvorzhak
does the issue occur when you try to ssh from your machine or in cloud console? — lukaszberwid

lmrta lmrta · Accepted Answer · 2020-11-12T14:53:48

Turns out the problem is that the cluster creates a new account called my_username on the cluster master VM, but I am logged into my laptop as a user called 'admin'. So there is a mismatch between account name and key at the destination, so the login fails.

Can be fixed by adding username to the gcloud command:

gcloud compute ssh my_username@mycluster-m

Though I still don't really understand why the ssh keys are different for the dataproc VM and a compute engine VM, I'd be happy if someone can enlighten me.

Unable to ssh to master node of Google Cloud Dataproc, but can ssh to Compute Engine VM

1 Answers