we are working on a spark application. which will hosted on azure HDInsight Spark cluster. our use-case is such that we have to pull data from azure blob storage and process data using spark and finally create or append data back to azure blob storage. so we used azure-storage-4.3.0.jar
we used Maven in our eclipse project and added following dependency
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<version>4.3.0</version>
</dependency>
The compilation was successful. even the application run fine from local machine and execute with no issues.
so we created a uber/fat jar from eclipse and ported to our Azure HDInsight-Spark cluster, and run the following command:
spark-submit --class myClassName MyUberJar.jar --verbose
the application encounter the following error:
Exception in thread "main" java.lang.NoSuchMethodError: com.microsoft.azure.storage.blob.CloudBlockBlob.startCopy(Lcom/microsoft/azure/storage/blob/CloudBlockBlob;)Ljava/lang/String;
at com.lsy.airmon2.dao.blob.AzureStorageImpl.moveData(AzureStorageImpl.java:188)
at com.lsy.airmon2.processor.SurveyProcessor.stageData(SurveyProcessor.java:92)
at com.lsy.airmon2.processor.Processor.doJob(Processor.java:27)
at com.lsy.airmon2.entrypoint.AirMon2EntryPoint.runP(AirMon2EntryPoint.java:109)
at com.lsy.airmon2.entrypoint.AirMon2EntryPoint.run(AirMon2EntryPoint.java:82)
at com.lsy.airmon2.entrypoint.AirMon2EntryPoint.main(AirMon2EntryPoint.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
when we dig deeper this issue we identify that azure HDInsight Spark already have older version of azure-storage(azure-storage.2.2.0.jar) path /usr/hdp/current/hadoop-client/lib and this older version doesn't have startCopy method this method added in azure-storage.3.0.0.jar version.
so we replace the azure-storage.2.2.0.jar to azure-storage.3.0.0.jar on all the Driver and Worker nodes. after this change the strange exception encountered by the application:
java.net.ConnectException: Call From hn0-FooBar/10.XXX.XXX.XXX to hn1-FooBar.xyzabcxyzabc.ax.internal.cloudapp.net:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1430)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:514)
at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:956)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:855)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:617)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:715)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492)
at org.apache.hadoop.ipc.Client.call(Client.java:1402)
... 14 more
so we reverted back all the changes and we are back to square one.
Any suggestion on how to resolve this?