0
votes

I'm trying to connect Spark to azure blob storage (wasbs). I add the following jars in the hadoop classpath

com.microsoft.azure_azure-storage-7.0.0.jar
org.apache.hadoop_hadoop-annotations-3.1.2.jar
org.apache.hadoop_hadoop-auth-3.1.2.jar
org.apache.hadoop_hadoop-azure-3.1.2.jar
org.apache.hadoop_hadoop-common-3.1.2.jar
org.eclipse.jetty_jetty-http-9.3.24.v20180605.jar
org.eclipse.jetty_jetty-io-9.3.24.v20180605.jar
org.eclipse.jetty_jetty-security-9.3.24.v20180605.jar
org.eclipse.jetty_jetty-server-9.3.24.v20180605.jar
org.eclipse.jetty_jetty-servlet-9.3.24.v20180605.jar
org.eclipse.jetty_jetty-webapp-9.3.24.v20180605.jar
org.eclipse.jetty_jetty-xml-9.3.24.v20180605.jar

and i try to use spark-submit using:

spark-submit --class mainClass --jars jars/org.apache.hadoop_hadoop-azure-3.1.2.jar,jars/com.microsoft.azure_azure-storage-7.0.0.jar,jars/org.apache.hadoop_hadoop-common-3.1.2.jar myjar.jar

and i get the following exception:

Exception in thread "main" java.lang.NoClassDefFoundError: org/eclipse/jetty/util/ajax/JSON$Convertor

If i remove hadoop-commons from the spark-submit --jars i get:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities

and if i add --jars jars/* to include all the jar files along with the jetty-utils i get

java.lang.ClassNotFoundException: my.package.MainClass

i saw similar posts that indicate multiple versions of jetty, but i can't find other versions anywhere.

1

1 Answers

0
votes

For the first exception, you're missing jetty util

https://mvnrepository.com/artifact/org.eclipse.jetty/jetty-util/9.3.24.v20180605

And you should verify hadoop classpath returns what you want

For the remaining exceptions, you should verify that you can run hadoop fs - ls wasb://path on each potential Spark executor