Is this the way Hadoop works?
Client submit a MapReducer job/program to NameNode.
JobTracker (resides on NameNode) allocates task to the slave task trackers that are running on individual worker machines(date nodes)
Each Tasktracker is responsible to execute and manage the individual tasks assigned by Job Tracker
According to above scenario MapReducer program will run on slave node. Does it means that Job is going to consume Slave computation Engine or Processing Power?.
What if I want to use another machine (independent to Hadoop installation system) to execute MapReduce job and uses Hadoop Clusters data?
Why should I use Hadoop clusters? Hadoop distribute the large data in a very efficient way to their DataNode(s) .
The new scenario would be as follow:
a. Server
b. Client
a.1 ) Distribute the un-ordered data using Hadoop Clusters
b.1) Client will execute (not submitted to NameNode) a MapReducer job which is getting data from Hadoop Clusters datanode. If it's possible then what will happen to JobTracker (NameNode) and Tasktracker (DataNode) ?
I am ignoring the major part of Hadoop over here here by executing the job at client machine but that is my project requirement. Any suggestion on it?