2
votes

I constructed Hadoop environment as Pseudo distributed mode(on OSX). Below snippets are configuring files.

○core-site.xml

<configuration>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://127.0.0.1:9000</value>
  </property>
</configuration>

○mapred-site.xml

<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>127.0.0.1:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>127.0.0.1:19888</value>
</property>
<property>
    <name>mapreduce.jobhistory.admin.address</name>
    <value>127.0.0.1:10033</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.https.address</name>
    <value>127.0.0.1:19890</value>
</property>
</configuration>

○yarn-site.xml

<configuration>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.webapp.https.address</name>
    <value>127.0.0.1:8044</value>
</property>
<property>
    <name>yarn.resourcemanager.address</name>
    <value>127.0.0.1:8032</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>127.0.0.1:8030</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>localhost</value>
</property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>127.0.0.1:8031</value>
</property>
<property>
    <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
    <value>98.5</value>
</property>
<property>
    <name>yarn.nodemanager.hostname</name>
    <value>localhost</value>
</property>
</configuration>

And the jps result is below.

79464 NameNode
79562 DataNode
79696 SecondaryNameNode
79831 ResourceManager
79945 NodeManager

I could open "http://localhost:8088", so I looked job details. And I clicked The History button(link to "http://[private_ip_addr]:8088/proxy/application_xxxx/") to try to open "Tracking URL"(Below image is a job detail page), the connection was refused(The error code on google chrome is "ERR_CONNECTION_REFUSED"). job detail

I could open a node manager(http://127.0.0.1:8042) like below screenshot but I couldn't open "RM Home"(URL is "http://[private_ip_addr]:8088"). rm page

Are there any mistakes in configuring files, or my network environment is not correct? If you need my network information (port etc.), write it if you write it.

Thanks.

--addition--

(180506 23:00)

I checked A Node Manager log file. And I found the there occurred error "Could not determine OS". Below is a part of log file.

2018-05-06 23:00:03,353 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher
2018-05-06 23:00:03,533 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.ContainerManagerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
2018-05-06 23:00:03,534 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.NodeManagerEventType for class org.apache.hadoop.yarn.server.nodemanager.NodeManager
2018-05-06 23:00:03,642 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2018-05-06 23:00:03,822 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2018-05-06 23:00:03,822 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system started
2018-05-06 23:00:03,932 WARN org.apache.hadoop.yarn.util.ResourceCalculatorPlugin: java.lang.UnsupportedOperationException: Could not determine OS: Failed to instantiate default resource calculator.
java.lang.UnsupportedOperationException: Could not determine OS
    at org.apache.hadoop.util.SysInfo.newInstance(SysInfo.java:43)
    at org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.<init>(ResourceCalculatorPlugin.java:41)
    at org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.getResourceCalculatorPlugin(ResourceCalculatorPlugin.java:191)
    at org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl.serviceInit(NodeResourceMonitorImpl.java:73)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:357)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:636)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:684)

(180507 14:55)

I upgraded to Hadoop 3.0.0 then the error "Could not determine OS" was removed, but still, the task track page didn't work.

I checked Nodemanager log once again, then I found the message.

2018-05-07 14:53:14,803 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: 
Node Manager health check script is not available or doesn't have execute permission, so not starting the node health script runner.

Is this a cause?

(180507 16:22)

I forgot to run JobHistoryServer, so I execute $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver.

41446 JobHistoryServer
41672 NameNode
41779 DataNode
41924 SecondaryNameNode
42128 ResourceManager
42234 NodeManager
42772 Jps

It looks JobHistoryServer is running, but I cannot open the Job Track page.

(180507 16:38) SOLVED

I try to open job history page not by clicking HISTORY button bun accessing "http://localhost:19888", then I can open a job history page!! job history page

The causes might be

  1. The setting properties of mapred-site.xml and others(Mr. Phani Kumar Yadavilli suggested)
  2. Whether run the Job History Server Process.
  3. Access by using URL, not using History buttom
  4. The version of Hadoop.
1

1 Answers

2
votes

You haven't specified the version of Hadoop you are using. There is a JIRA on this issue and the fix is available from v2.9 onwards

https://issues.apache.org/jira/browse/YARN-4330?devStatusDetailDialog=repository

You can try setting the below parameters as per your system configuration.

There are two kinds of calculators currently available in YARN – the DefaultResourceCalculator and the DominantResourceCalculator.

The DefaultResourceCalculator only takes memory into account when doing its calculations. This is why CPU requirements are ignored when carrying out allocations in the CapacityScheduler by default. All the math of allocations is reduced to just examining the memory required by resource-requests and the memory available on the node that is being looked at during a specific scheduling-cycle.

In order to enable CPU scheduling, there are some configuration properties that administrators and users need to be aware of.

scheduler.capacity.resource-calculator: To enable CPU scheduling in CapacityScheduler, this should be set to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator in capacity-scheduler.xml file. nodemanager.resource.cpu-vcores: Set to the appropriate number in yarn-site.xml on all the nodes. This is strictly dependent on the type of workloads running in a cluster, but the general recommendation is that admins set it to be equal to the number of physical cores on the machine. MapReduce framework has its own configurations that users should use in order to take advantage of CPU scheduling in YARN.

map.cpu.vcores: Set to the number of vcores required for each map task. reduce.cpu.vcores: Set to the number of vcores required for each reduce task. yarn.app.mapreduce.am.resource.cpu-vcores: Set to the number of vcores the MR AppMaster needs.