I try to assembly a Scala 2.11 Spark 2.0 application using hortonworks-spark/shc to access hbase.
The dependencies set looks simple:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.0.2" % "provided",
"com.hortonworks" % "shc-core" % "1.0.1-2.0-s_2.11"
)
The problem comes when I try to assembly the application in a Fat jar, because there are a lot of transience dependencies with different version, then the assembly plugin throw duplicate errors. One example:
deduplicate: different file contents found in the following:
[error] /home/search/.ivy2/cache/org.mortbay.jetty/jsp-2.1/jars/jsp-2.1-6.1.14.jar:org/apache/jasper/xmlparser/XMLString.class
[error] /home/search/.ivy2/cache/tomcat/jasper-compiler/jars/jasper-compiler-5.5.23.jar:org/apache/jasper/xmlparser/XMLString.class
Also, I don't know if it is right include in the jar dependencies like org.apache.hbase:hbase-server:1.1.2
So, basically, the question is: Anyone know the right way to assembly an Scala Spark application using this library and sbt and can provide an example? (And maybe add it in the documentation of hortonworks-spark/shc
Note: hortonworks-spark/shc is not include in spark-packages so I can not use the --packages option if it is not with a local copy of the jars. I am using EMR, so I don't have a preconfigured cluster where copy the jar without add more complexity to the deployment.