2
votes

I was following this tutorial trying to install and configure Spark on my cluster.. My cluster (5 nodes) is hosted on AWS and installed from Cloudera Manager.

It is mentioned in the tutorial that "Sync the contents of /etc/spark/conf to all nodes." after the modification of the configuration file.

I am really wondering what is the easies way to make that happen. I read a post that has a similar question like mine HERE. Based on my understanding, for the configuration files of hadoop, hdfs ...etc. which are monitored by zookeeper or cloudera manager. That might be the case to use CM deploy or zookeeper to make it happen.

However, Spark's configuration file is totally outside zookeeper's scope. How can I "sync" to other nodes..

Many thanks!

1
How about using EMR for easy setup:aws.amazon.com/articles/4926593393724923 - Guy

1 Answers

0
votes

Why don't you mount the same EBS via NFS to /etc/spark/conf or one of it's parents so the files are automatically synced?