I am trying to understand the "concept" of connecting to a remote server. What I have are 4 servers on CentOS using CDH5.4 What I want to do is connect spark on yarn on all these four nodes. My problem is I do not understand how to set HADOOP_CONF_DIR as specified here. Where and what value should i set for this variable? And then do I need to set this variable on all four nodes or only the master node will suffice?
The documentation says "Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster". I have read many questions similar to this before asking it in here. Please, let me know what can I do to solve this problem. I am able to run spark and pyspark on stand alone mode on all nodes.
Thanks for your help. Ashish