3
votes

I'm running into a problem with running my application on EMR master node. It needs to access some AWS SDK methods added in ver 1.11. All the required dependencies were bundled into a fat jar and the application works as expected on my dev box.

However, if the app is executed on EMR master node, it fail with NoSuchMethodError exception when calling a method, added in AWS SDK ver 1.11+, e.g.

java.lang.NoSuchMethodError:
 com.amazonaws.services.sqs.model.SendMessageRequest.withMessageDeduplicationId(Ljava/lang/String;)Lcom/amazonaws/services/sqs/model/SendMessageRequest;

I tracked it down to the classpath parameter passed to JVM instance, started by spark-submit:

-cp /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf/:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/

In particular, it loads /usr/share/aws/aws-java-sdk/aws-java-sdk-sqs-1.10.75.1.jar instead of using ver 1.11.77 from my fat jar.

Is there a way to force Spark to use the AWS SDK version I need?

1
It looks like spark.executor.userClassPathFirst set to true should allow your provided jar to override the classpath params: spark.apache.org/docs/latest/configuration.htmlDave Maple
@DaveMaple : I tried adding --conf spark.driver.userClassPathFirst=true to the spark-submit command line. My app exists almost immediately with Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: class org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback not org.apache.hadoop.security.GroupMappingServiceProvider. It looks like a version conflict to me.Denis Makarenko
darn. yeah. i guess we'd have to be selective then about only the aws sdk. will think on this.Dave Maple
Shading (i.e. using an alternative name) for the latest version of com.amazonaws.services.sqs package doesn't work either. It turned out that AmazonSQSClient.init() calls HandlerChainFactory.newRequestHandlerChain("/com/amazonaws/services/sqs/request.handlers") i.e. it uses a hard-coded package name and so it can't find the renamed one.Denis Makarenko

1 Answers

2
votes

Here is what I learned trying to troubleshoot this.

The default class path parameter is constructed using spark.driver.extraClassPath settings from /etc/spark/conf/spark-defaults.conf. spark.driver.extraClassPath contains a reference to the older version AWS SDK, which is located in /usr/share/aws/aws-java-sdk/*

To use the newer version of AWS API, I uploaded the jars to a dir I created in the home dir and specified it in --driver-class-path spark-submit parameter:

--driver-class-path '/home/hadoop/aws/*'