I have been running Hadoop MapReduce jobs by logging into SSH via PuTTy which requires that I enter Host Name/IP address, Login name and password into PuTTY in order to get the SSH command line window. Once in the SSH console window, I then provide the appropriate MR commands, such as:
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.0.1.jar -file /nfs_home/appers/user1/mapper.py -file /nfs_home/appers/user1/reducer.py -mapper '/usr/lib/python_2.7.3/bin/python mapper.py' -reducer '/usr/lib/python_2.7.3/bin/python reducer.py' -input /ccexp/data/test_xml/0901282-510179094535002-oozie-oozi-W/extractOut//.xml -output /user/ccexptest/output/user1/MRoutput
What I would like to do is use Python to change this clunky process so that I can launch the MapReduce job from within a Python script and avoid having to log into SSH via PuTTy.
Can this be done and if so, can someone show me how?