3
votes

we are working on a spark application. which will hosted on azure HDInsight Spark cluster. our use-case is such that we have to pull data from azure blob storage and process data using spark and finally create or append data back to azure blob storage. so we used azure-storage-4.3.0.jar

we used Maven in our eclipse project and added following dependency

<dependency>
     <groupId>com.microsoft.azure</groupId>
     <artifactId>azure-storage</artifactId>
     <version>4.3.0</version>
</dependency>

The compilation was successful. even the application run fine from local machine and execute with no issues.

so we created a uber/fat jar from eclipse and ported to our Azure HDInsight-Spark cluster, and run the following command:

spark-submit --class myClassName MyUberJar.jar --verbose

the application encounter the following error:

Exception in thread "main" java.lang.NoSuchMethodError: com.microsoft.azure.storage.blob.CloudBlockBlob.startCopy(Lcom/microsoft/azure/storage/blob/CloudBlockBlob;)Ljava/lang/String;
            at com.lsy.airmon2.dao.blob.AzureStorageImpl.moveData(AzureStorageImpl.java:188)
            at com.lsy.airmon2.processor.SurveyProcessor.stageData(SurveyProcessor.java:92)
            at com.lsy.airmon2.processor.Processor.doJob(Processor.java:27)
            at com.lsy.airmon2.entrypoint.AirMon2EntryPoint.runP(AirMon2EntryPoint.java:109)
            at com.lsy.airmon2.entrypoint.AirMon2EntryPoint.run(AirMon2EntryPoint.java:82)
            at com.lsy.airmon2.entrypoint.AirMon2EntryPoint.main(AirMon2EntryPoint.java:42)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:606)
            at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
            at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
            at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
            at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
            at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

when we dig deeper this issue we identify that azure HDInsight Spark already have older version of azure-storage(azure-storage.2.2.0.jar) path /usr/hdp/current/hadoop-client/lib and this older version doesn't have startCopy method this method added in azure-storage.3.0.0.jar version.

so we replace the azure-storage.2.2.0.jar to azure-storage.3.0.0.jar on all the Driver and Worker nodes. after this change the strange exception encountered by the application:

java.net.ConnectException: Call From hn0-FooBar/10.XXX.XXX.XXX to hn1-FooBar.xyzabcxyzabc.ax.internal.cloudapp.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
    at org.apache.hadoop.ipc.Client.call(Client.java:1430)
    at org.apache.hadoop.ipc.Client.call(Client.java:1363)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
    at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
    at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
    at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:514)
    at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
    at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
    at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
    at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:956)
    at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:855)
    at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:617)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:715)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492)
    at org.apache.hadoop.ipc.Client.call(Client.java:1402)
    ... 14 more

so we reverted back all the changes and we are back to square one.

Any suggestion on how to resolve this?

1
you can't just replace jars without proper compilation, you have to get the spark source code, add the jar, recompile and deploy. The connection exception you see is coming from the headnode not being online, maybe you should take a look at hn1 and its logs. - Thomas Jungblut
@ThomasJungblut: you are right that we just can't replace the jars and as I mentioned we reverted back to the system... next we tried passing the jar azure-storage-4.3.0.jar from outside using spark-submit --jars parameter but that also didn't work. And now I suspect that this is due to the older version of azure-storage-2.2.0.jar already loaded in JVM, and the new methods referred from 4.3.0 failed. - Nihal Bhagchandani

1 Answers

0
votes

Try using the --packages switch in the spark-submit command.

For example, I've used this in previous applications (although not with uber jars):

--packages com.microsoft.azure:azure-storage:8.0.0

So it should look something like this:

spark-submit --packages com.microsoft.azure:azure-storage:8.0.0 --class myClassName MyUberJar.jar --verbose