0
votes

This is what i have understood so far reading from varied sources on the internet.

Avro mapred and Avro are not part of CDH4 (Cloudera Distribution) and i have to set it up manually using HADOOP_CLASSPATH=avro.jar:avro-mapred.jar

I have done that and when i run my job on my pseudo cluster it throws the following exception:

13/12/27 00:47:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/12/27 00:47:40 INFO mapred.FileInputFormat: Total input paths to process : 1

13/12/27 00:47:41 INFO mapred.JobClient: Running job: job_201312221245_0017

13/12/27 00:47:42 INFO mapred.JobClient: map 0% reduce 0%

13/12/27 00:47:57 INFO mapred.JobClient: Task Id : attempt_201312221245_0017_m_000000_0, Status : FAILED

java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found

I'm running the job as follows:

hadoop jar build/libs/hadoop-boilerplate-1.0.jar CustomerMapReduce transactions/input transactions/output1 -libjars /path/to/libs/avro-1.7.4.jar,/path/to/libs/avro-mapred-1.7.4.jar

1
Found the problem finally. hadoop jar build/libs/hadoop-boilerplate-1.0.jar AvroMain -libjars jars/avro-mapred-1.7.4.jar,jars/avro-1.7.4.jar -files jars/avro-mapred-1.7.4.jar,jars/avro-1.7.4.jar transactions/input transactions/output2 (the -libjars argument has to be in exactly the position shown here)user2272480
Also i had to do this export HADOOP_CLASSPATH=jars/avro-mapred-1.7.4.jar:jars/avro-1.7.4.jar See this link for more grepalex.com/2013/02/25/hadoop-libjarsuser2272480

1 Answers

0
votes

You should implement Tool and use getConf() for job configuration.

public class SomeClass extends Configured implements Tool {
    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        ...
    }
}