
I am trying to run a hadoop streaming server with following command from a shell script

hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.19.2-streaming.jar -input $1 -output Twitter/Net.pegasus -mapper 'mapper.py Reverse' -reducer NONE -file mapper.py
hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.19.2-streaming.jar -input $1 -output Twitter/Net.exclude -mapper 'mapper.py Reverse' -reducer reducer.py -file mapper.py -file reducer.py -file ../twitter/exclude.txt
hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.19.2-streaming.jar -input $1 -output Twitter/Net.complete -mapper 'mapper.py Reverse' -reducer reducer.py -file mapper.py -file reducer.py

I am getting following error

/usr/bin/env: python2.5: No such file or directory
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

However the higher version of python is already installed

$ which python
$ python --version
Python 2.7.3

I read on some other post that by "apt-get install python2.5" it will work but that package isnt available and I also tried adding #!/usr/bin/env python to the top of my python script and it didnt work either

Hadoop Streaming Job failed error in python

It looks like you have an executable python script that starts with this line: #!/usr/bin/env python2.5. In that case, there is supposed to be a binary called python2.5 that is invoked to run the script. Does that version of python exist on your machine? Run /usr/bin/env python2.5 at the command line and see.hughdbrown
@hughdbrown I didnt had that version on my machine and one of the python scripts (mapper.py) was starting with this line: #!/usr/bin/env python2.5 so i just changed it to starts with this line: #!/usr/bin/env python and it worked. Thanksvik
#!/usr/bin/env python looks for python in the PATH. It not working when you have a /usr/bin/python on your filesystem implies that /usr/bin/ is not in the PATH in the execution context where you start your program.Charles Duffy

1 Answers

$ which python
$ python --version
Python 2.7.3

Indicate you have a higher version of Python installed than default. You have to make sure this is the same case for every node in the cluster. For example, I work with a cluster Redhat 6.4 version cluster and the default Python is Python 2.6 I think as default, but I have customized the master node a bit and it is different from the datanode. So you need to configure your mapper to call a python that exist on every node.



at the beginning of the python script always work for me.