I'm running into a problem with running my application on EMR master node. It needs to access some AWS SDK methods added in ver 1.11. All the required dependencies were bundled into a fat jar and the application works as expected on my dev box.
However, if the app is executed on EMR master node, it fail with NoSuchMethodError exception when calling a method, added in AWS SDK ver 1.11+, e.g.
java.lang.NoSuchMethodError:
com.amazonaws.services.sqs.model.SendMessageRequest.withMessageDeduplicationId(Ljava/lang/String;)Lcom/amazonaws/services/sqs/model/SendMessageRequest;
I tracked it down to the classpath parameter passed to JVM instance, started by spark-submit:
-cp /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf/:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/
In particular, it loads /usr/share/aws/aws-java-sdk/aws-java-sdk-sqs-1.10.75.1.jar instead of using ver 1.11.77 from my fat jar.
Is there a way to force Spark to use the AWS SDK version I need?
spark.executor.userClassPathFirst
set to true should allow your provided jar to override the classpath params: spark.apache.org/docs/latest/configuration.html – Dave Maple