I have the Hadoop (version: 2.5.0) cluster with 3 machines.
Topology: 10.0.0.1 NameNode, DataNode 10.0.0.2 DataNode 10.0.0.3 DataNode
Configured as below:
Core-site
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://10.0.0.1/</value>
<final>true</final>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/tuannd/hdfs/namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/tuannd/hdfs/datanode</value>
<final>true</final>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapredure.jobtracker.address</name>
<value>10.0.0.1:9001</value>
<final>true</final>
</property>
<property>
<name>mapredure.cluster.local.dir</name>
<value>/tmp/hadoop/mapredure/system</value>
<final>true</final>
</property>
<property>
<name>mapreduce.tasktracker.map.tasks.maximum</name>
<value>7</value>
<final>true</final>
</property>
<property>
<name>mapreduce.tasktracker.reduce.tasks.maximum</name>
<value>7</value>
<final>true</final>
</property>
<property>
<name>mapredure.map.tasks</name>
<value>100</value>
</property>
<property>
<name>mapredure.task.timeout</name>
<value>0</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx512M</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx1024M</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
</configuration>
slaves
10.0.0.1
10.0.0.2
10.0.0.3
start-all.sh. On Master:
19817 Jps
15240 ResourceManager
12521 SecondaryNameNode
12330 DataNode
12171 NameNode
15381 NodeManager
On Slaves:
24454 NodeManager
22828 DataNode
24584 Jps
Code wordcount: the same this link
With the same input data.
- On Eclipse (master machine): Processing in 9s.
- On Hadoop cluster: Processing in 30s.
I don't know that what wrong on Hadoop cluster configure file? Timing processing data on Hadoop cluster slower than on eclipse!
Thanks.