I am trying to access a firewalled Hadoop cluster running YARN via a SOCKS proxy. The cluster itself is not using proxied connections -- only my client running on a local machine (e.g. a laptop) is connected via ssh -D 9999 user@gateway-host
to a machine that can see the Hadoop cluster.
In the Hadoop configuration core-site.xml
(on my laptop) I have the following lines:
<property>
<name>hadoop.socks.server</name>
<value>localhost:9999</value>
</property>
<property>
<name>hadoop.rpc.socket.factory.class.default</name>
<value>org.apache.hadoop.net.SocksSocketFactory</value>
</property>
Accessing HDFS this way works great. However, when I try to submit a YARN job, it fails and I can see in the logs that the nodes are not able to talk to each other:
java.io.IOException: Failed on local exception: java.net.SocketException: Connection refused; Host Details : local host is: "host1"; destination host is: "host2":8030;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
where host1
and host2
are both parts of the hadoop cluster.
I guess what is happening is that the hadoop nodes are trying to communicate via a socks proxy as well and this is obviously failing since no proxy server exists on each host. Is there a way to fix this apart from setting up a dedicated proxy server?