How to change the number of data nodes, that is disable and enable certain data nodes to test scalability? To be more clear, I have 4 data nodes, and I want to experiment the performance with 1, 2, 3 and 4 data nodes one by one. Would it be possible just updating slaves file in namenode?
2 Answers
2
votes
The correct way to temporarily decommission a node:
- Create an "exclude file". This lists the hosts, one per line, that you wish to remove.
- Set
dfs.hosts.exclude
andmapred.hosts.exclude
to the location of this file. - Update the namenode and jobtracker by doing
hadoop dfsadmin -refreshNodes
andhadoop mradmin -refreshNodes
- This will start the decomissioning process. All of the data that used to be replicated on those nodes will be copied off of them and onto other nodes. You can check the progress through the web UI.
Note that those nodes will not be used for MR jobs as soon as you do hadoop mradmin -refreshNodes
but they will still hold data, so you might eat some network latency that you wouldn't otherwise if you run something before decommissioning is complete. So for a totally realistic test, you should wait until it is finished.
To add the nodes back, simply remove them from the exclude file and do the -refreshNodes commands again.