0
votes

I've installed Hadoop 3.2.1 in my Ubuntu 20.04 on Virtualbox for my college study and college's deadline so I'm new in Hadoop. And I've searching several source in internet how to mapreduce on Hadoop.

But, when I type this on terminal:

hadoop jar '/home/tamminen/WordCountTutorial/firstTutorial.jar' WordCount /WordCountTutorial/Input /WordCountTutorial/Output

in format :

hadoop jar <JAR_FILE> <CLASS_NAME> <HDFS_INPUT_DIRECTORY> <HDFS_OUTPUT_DIRECTORY>

The command appear like this :

2020-10-11 18:59:04,584 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:05,595 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:06,598 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:07,618 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:08,619 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:09,621 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:10,624 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:11,625 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:12,627 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:13,629 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:13,632 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From 18k10018-data-mining/10.0.2.15 to localhost:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over null after 3 failover attempts. Trying to failover after sleeping for 34444ms.

Which it lead me into cannot do hadoop dfs -cat <HDFS_OUTPUT_DIRECTORY>*

And this is my hadoop configuration file that i've change like this :

core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.proxyuser.dataflair.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.dataflair.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.server.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.server.groups</name>
    <value>*</value>
    </property>
</configuration> 

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapred.child.java.opts</name>
        <value>-Xmx4096m</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>127.0.0.1:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>127.0.0.1:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>127.0.0.1:8032</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

and then hadoop-env.sh

...
# Extra Java runtime options for all Hadoop commands. We don't support
# IPv6 yet/still, so by default the preference is set to IPv4.
# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
export HADOOP_OPTS="-Xmx5096m"  --> Only this I added from searching hadoop tutorial solution beside of JAVA_HOME
# For Kerberos debugging, an extended option set logs more information
# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
...

Can anyone explain why this is error and give me solution what should I do to do hadoop jar?

1
If you run jps, is YARN running?OneCricketeer
i think no, only 3206 DataNode, 15415 jps, 3468 SecondaryNameNode, and 3055 NameNodepup_in_the_tree
That would explain the error connecting to YARN, then. I assume you ran start-yarn?OneCricketeer
First, I didn't run start-yarn, then I tried to ran start-yarn after you suggest me so when I run jps, the terminal appear like this : 29025 NodeManager 3206 DataNode 28824 ResourceManager 29192 Jps 3468 SecondaryNameNode 3055 NameNode. But when i do hadoop jar command, the command still appear like this : java.net.ConnectException: Connection refused; For more details see: wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over null after 5 failover attempts. Trying to failover after sleeping for 42681ms.pup_in_the_tree
Have you read the suggestions in that link?OneCricketeer

1 Answers

0
votes

This may happen because sometime Hadoop starts some services on internal IP address of server instead of localhost or 127.0.0.1. You can try changing 127.0.0.1 to actual IP address of your server in all Hadoop config files and see if it works. Other way around is to edit /etc/hosts file as root and map localhost to actual ip of your server.

For more precise instructions follow below article, https://hadooptutorials.info/2020/10/05/part-1-apache-hadoop-installation-on-single-node-cluster-with-google-cloud-virtual-machine/