2
votes

Is there a proper way to execute a shell script on every node in a running EMR hadoop cluster?

Everything I look for brings up bootstrap actions, but that only applies to when the cluster is starting, not for a running cluster.

My application is using python, so my current guess is to use boto to list the IPs of each node in the cluster, then loop through each node and execute the shell script via ssh.

Is there a better way?

1

1 Answers

0
votes

If your cluster is already started, you should use steps.

The steps are executed after the cluster is started, so technically it appears to be what you are looking for.

Be careful, steps are executed only on the master node, you should connect to the rest of your nodes in some way for modifyng them.

Steps are scripts as well, but they run only on machines in the Master-Instance group of the cluster. This mechanism allows applications like Zookeeper to configure the master instances and allows applications like Hbase and Apache Drill to configure themselves.

Reference

See this also.