4
votes

I am fairly new to hadoop. For running some benchmarks, I need variety of hadoop configuration for comparison.

I want to know a method to remove a hadoop slave from DFS (not running datanode daemon anymore) but not from Mapred (keep running tasktracker), or vice-versa. AFAIK, there is a single slave file for such hadoop nodes and not separate slave files for DFS and Mapred.

Currently, I am trying to start both DFS and Mapred on the slave node , and then killing datanode on the slave. But it takes a while to put that node in to 'dead nodes' on HDFS GUI. Any parameter can be tuned to make this timeout quicker ?

Thankssss

2

2 Answers

7
votes

Try using dfs.hosts and dfs.hosts.exclude in the hdfs-site.xml, mapred.hosts and mapred.hosts.exclude in mapred-site.xml. These are for allowing/excluding hosts to connect to the NameNode and the JobTracker.

Once the list of nodes in the files has been updated appropriately, the NameNode and the JobTracker have to be refreshed using the hadoop dfsadmin -refreshNodes and hadoop mradmin -refreshNodes command respectively.

0
votes

Instead of using slaves file to start all processes on your cluster, you can start only required daemons on each machine if you have few nodes.