2
votes

I am attempting to run a spark job that accesses dynamodb and the old way of instantiating a dynamoDb client has been deprecated and it is now recommended to use the client builder.

Well, this works fine locally, but when I deploy to EMR i'm getting this error:

Exception in thread "main" java.lang.IllegalAccessError: tried to access class com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientConfigurationFactory from class com.amazonaws.services.dynamodbv2.AmazonDynamoDBAsyncClientBuilder

My code that causes this is:

val dynamoDbClient = AmazonDynamoDBAsyncClientBuilder
  .standard()
  .withRegion(Regions.US_EAST_1)
  .build()

my build.sbt contains:

libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.114"

and my spark-submit command looks like this:

spark-submit --conf spark.eventLog.enabled=false --packages com.typesafe.play:play-json_2.11:2.5.9,com.github.traviscrawford:spark-dynamodb:0.0.6,com.amazonaws:aws-java-sdk:1.11.114 --master yarn --deploy-mode cluster --class Main application.jar

Does anyone have any ideas? Am I overlooking something basic?

Update

I noticed that EMR was running OpenJDK 1.8 and my local system was running Oracle Java 1.8. I changed the EMR cluster to match the java I was running, but there was still no change.

1

1 Answers

1
votes

I dont have a perfect answer here but I'm struggling with a similar problem with a fat jar build Spark Driver running on EMR. So I drop my recent tour.

  1. Try to run spark-submit with option -v and look into the logs about class paths and so forth. As I can see EMR is loading an aws-java-sdk as well. Its not clear to me which version of aws-java-sdk EMR is running? EMR release 4.7.0 states "Upgraded the AWS SDK for Java to 1.10.75" (http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-whatsnew.html).
  2. Then add another argument --conf spark.driver.userClassPathFirst=true to load the aws-java-sdk version your driver specifies.

Unfortunately the last step raises yarn errors like: Unable to load YARN support ... (some discussion on that: https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/spark-submit-fails-after-setting-userClassPathFirst-to-true/td-p/46778)

Some discussion from the aws-java-sdk github repos: https://github.com/aws/aws-sdk-java/issues/1094

Conclusion: For now use apis of aws-java-sdk version 1.10.75