I have downloaded (since I don't have space for running CDH or Sandbox) Hadoop 2.6.0 and hadoop streaming from here
I ran the command of
bin/hadoop jar contrib/hadoop-streaming-2.6.0.jar \
-file ${HADOOP_HOME}/py_mapred/mapper.py -mapper ${HADOOP_HOME}/py_mapred/mapper.py \
-file ${HADOOP_HOME}/py_mapred/reducer.py -reducer ${HADOOP_HOME}/py_mapred/reducer.py \
-input /input/davinci/* -output /input/davinci-output
where I stored the downloaded streaming jar in ${HADOOP_HOME}/contrib, and the other files in py_mapred. At the same time, I copyFromLocal to /input directory on hdfs. Now, when I run the command, the following lines show up:
15/08/14 17:35:45 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
15/08/14 17:35:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/usr/local/cellar/hadoop/2.6.0/py_mapred/mapper.py, /usr/local/cellar/hadoop/2.6.0/py_mapred/reducer.py, /var/folders/c5/4xfj65v15g91f71c_b9whnpr0000gn/T/hadoop-unjar3313567263260134566/] [] /var/folders/c5/4xfj65v15g91f71c_b9whnpr0000gn/T/streamjob9165494241574343777.jar tmpDir=null
15/08/14 17:35:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/08/14 17:35:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/08/14 17:35:48 INFO mapred.FileInputFormat: Total input paths to process : 1
15/08/14 17:35:48 INFO mapreduce.JobSubmitter: number of splits:2
15/08/14 17:35:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1439538212023_0002
15/08/14 17:35:49 INFO impl.YarnClientImpl: Submitted application application_1439538212023_0002
15/08/14 17:35:49 INFO mapreduce.Job: The url to track the job: http://Jonathans-MacBook-Pro.local:8088/proxy/application_1439538212023_0002/
15/08/14 17:35:49 INFO mapreduce.Job: Running job: job_1439538212023_0002
It looks like the command has been accepted. I checked on localhost:8088 and the job does register. However it's not running, despite the fact that it says Running job: job_1439538212023_0002. Is there something wrong with my command? Is it due to permission setting? Why isn't the job running?
Thank you