6
votes

I may be searching with the wrong terms, but google is not telling me how to do this. The question is how can I restart hadoop services on Dataproc after changing some configuration files (yarn properties, etc)?

Services have to be restarted on a specific order throughout the cluster. There must be scripts or tools out there, hopefully in the Dataproc installation, that I can invoke to restart the cluster.

2

2 Answers

5
votes

Configuring properties is a common and well supported use case.

You can do this via cluster properties, no daemon restart required. Example:

dataproc clusters create my-cluster --properties yarn:yarn.resourcemanager.client.thread-count=100

If you're doing something more advanced, like updating service log levels, then you can use systemctl to restart services.

First ssh to a cluster node and type systemctl to see the list of available services. For example to restart HDFS NameNode type sudo systemctl restart hadoop-hdfs-namenode.service

If this is part of initialization action then sudo is not needed.

-1
votes

On master nodes:

sudo systemctl restart hadoop-yarn-resourcemanager.service
sudo systemctl restart hadoop-hdfs-namenode.service

on worker nodes:

sudo systemctl restart hadoop-yarn-nodemanager.service
sudo systemctl restart hadoop-hdfs-datanode.service

After that, you can use systemctl status <name> to check the service status, also check logs in /var/log/hadoop.