I'm trying to setup hadoop server in pseudo-distribuited, to allow map/reduce tasks to be executed in parallel. Right now, when i run a job, the console output the following line:
Running job: job_local1508664063_0001
It means that i'm in local mode, and so it's normal that all tasks are sequenced. This is my current config, what i have to edit to let hadoop run parallel maps task / reduce tasks ? (I run the hadoop server using start-dfs and start-yarn)
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>mymachine:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>mymachine:50030</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
mymachine is the account name of the server. i have also tried with the ip getting the same results, the job manager still consider the server as "local". Current job create 12 map task, and these are run sequentially.
As reported in this thread:
stackoverflow.com/questions/26267476/why-my-map-reduce-job-is-running-sequentially
PS: to be sure that configs are loaded, in my java webservice i do a redundant set with:
conf.set("mapreduce.jobtracker.address", "mymachine:54311");
conf.set("mapreduce.jobtracker.http.address", "mymachine:50030");
And i also set resources to allow multiple contaniers ==> parallel map tasks
(i7 4/8, 8gb ram)
conf.set("yarn.nodemanager.resource.memory-mb", "6144");
conf.set("yarn.nodemanager.resource.cpu-vcores", "8");
conf.set("yarn.scheduler.minimum-allocation-mb", "1024");
How should i modify my config? My hadoop version is 2.7.1