My spark application fails to run on AWS EMR cluster. I noticed that this is because some classes are loaded from the path set by EMR and not from my application jar. For example
java.lang.NoSuchMethodError: org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
at com.sksamuel.avro4s.SchemaFor$.fieldBuilder(SchemaFor.scala:424)
at com.sksamuel.avro4s.SchemaFor$.fieldBuilder(SchemaFor.scala:406)
Here org.apache.avro.Schema is loaded from "jar:file:/usr/lib/spark/jars/avro-1.7.7.jar!/org/apache/avro/Schema.class"
Whereas com.sksamuel.avro4s
depends on avro 1.8.1. My application is built as a fat jar and has avro 1.8.1. Why isn't that loaded? Instead of picking 1.7.7 from EMR set classpath.
This is just an example. I see the same with other libraries I include in my application. May be Spark depends on 1.7.7 and I'd have to shade when including other dependencies. But why are the classes included in my app jar not loaded first?