I'm getting an error I think that has to do with how I set up my directory:
after running:
hadoop- jar hadoop-*.jar -file mapper.py -mapper mapper.py -file reducer.py -reducer reducer.py -input cs4501input -output py_wc_out
I get: packageJobJar: [mapper.py, reducer.py, /tmp/hadoop-ubuntu/hadoop-unjar6120166906857088018/] [] /tmp/streamjob1341652915014758694.jar tmpDir=null
12/04/08 01:34:01 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201204080100_0004
12/04/08 01:34:01 ERROR streaming.StreamJob: Error launching job , Output path already exists : Output directory hdfs://localhost:9000/user/ubuntu/py_wc_out already exists Streaming Job Failed!
I think it has to do with when I specified the core-site.xml file with hdfs, but that was in the quick start guide. I don't understand why I need to specify hdfs next to the localhost address with port number.