we have Hadoop cluster ( HDP 2.6.4 cluster with ambari , with 5 datanodes machines )
we are using spark streaming application (spark 2.1 run over Hortonworks 2.6.x )
the current situation is that spark streaming applications runs on all datanodes machines
as maybe some are know by yarn node labels we can enable spark streaming application to run only on the first 2 data-nodes machines
so if for example - we configured yarn node labels on the first 2 data-nodes machines then on the other 3 data-nodes machines spark application will not run because yarn node lables is disabled
my question is - is it possible by yarn node labels also to disable the HDFS on the 3 last data-nodes machines , ( in order to avoid any replica of HDFS on the 3 last data-nodes )
reference - http://crazyadmins.com/configure-node-labels-on-yarn/