Hadoop Nodes and Roles

Question

I've a Hadoop Cluster at work that has over 50 nodes, We occasionally face disk failures and require to decommission the datanode roles.

My Question is - if I were to only decommission the datanode and leave the tasktracker running, would this result in failed tasks/jobs on this node due to unavailability of HDFS Service on that node?

221B 221B · Accepted Answer · 2015-01-17T16:17:57

Does the TaskTracker on Node1 sit idle since there is no DataNode service on that Node? Correct, if the data node is disabled then the task tracker will not be able to process the data as the data will not be avaiable; it will be idle. 2. or Does the TaskTracker work on data from DataNodes on other Nodes? Nope, due to data locality principle, the task tracker will not process the data from other nodes.. 3. Do we get errors from TaskTracker Service on Node1 due to the DN on it's node being down? , Task tracker will not be able to process any data, so no errors.; 4. if I have services like Hive, Impala, etc running on HDFS - would those services throw error upon contact with TaskTracker on Node1? They will not be able to contact the task tracker on node 1. When client requests for the processing of the data, Name node tells the client about the data locations, so based on the data locations all other applications will communicate with data nodes

Hadoop Nodes and Roles

3 Answers