Adding java classes to sbt assembly

Question

I'm having a problem building a spark jar with scala. It's a really simple thing, I want to programatically access a mysql server via JDBC and load it in to a spark data frame. I can get this to work in the spark shell but I cannot package a jar that works with spark submit. It will package but when running, fails with

Exception in thread "main" java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3310/100million

My spark-submit command is

./bin/spark-submit ~/path/to/scala/project/target/scala-2.10/complete.jar --driver-class-path ~/path/to/mysql-connector-java-5.1.37-bin.jar

My build.sbt looks like

name := "sql_querier"

version := "1.0"

scalaVersion := "2.10.4"

sbtVersion := "0.13.7"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.1" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.5.1" % "provided"

libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.37"

assemblyJarName in assembly := "complete.jar"

mainClass in assembly := Some("sql_querier")

offline := true

and my very simple code is

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext

object sql_querier{

        def main(args: Array[String]) {

                val sc = new org.apache.spark.SparkContext()
                val sqlContext = new org.apache.spark.sql.SQLContext(sc)
                val url="jdbc:mysql://databaseurl:portno/database"

                val prop = new java.util.Properties
                prop.setProperty("user","myuser")
                prop.setProperty("password","mydatabase")
                val cats=sqlContext.read.jdbc(url, "categories", prop)
                cats.show
         }
 }

Where I've hidden the real values for user password and database url. I've also got a file in projects that adds the sbt assembly plugin but there is nothing wrong with this. When starting a spark shell with the --driver-class-path option pointing to the mysql jar, I can run the commands and extract data from the mysql database.

Any clue what I am doing wrong with the build would be greatly appreciated.

Dean

EDIT: Tried a whole bunch of things including different versions of the jdbc driver and adding the lines

sc.addJar("/Users/dean.wood/data_science/scala/sqlconn/mysql-connector-java-5.0.8-bin.jar")
Class.forName("com.mysql.jdbc.Driver")

to the scala file and the lines

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs@_*) =>
    xs.map(_.toLowerCase) match {
      case ("manifest.mf" :: Nil) |
           ("index.list" :: Nil) |
           ("dependencies" :: Nil) |
           ("license" :: Nil) |
           ("notice" :: Nil) => MergeStrategy.discard
      case _ => MergeStrategy.first // was 'discard' previousely
    }
  case "reference.conf" => MergeStrategy.concat
  case _ => MergeStrategy.first
}

to the build file.

Nothing seems to help.

Looks like mysql driver not in fat jar. Which task did you run? assembly ? — Fatih Donmez
You can try spark-submit ... --packages="mysql:mysql-connector-java:5.1.37" — Victor Moroz
@FatihDonmez he's not using the assembly task or his build file would have been different. But for Dean, this error is like Fatih said due to the absence of the mysql connector jar from your application jar and you need to use the sbt assembly plugin to create an uber jar with all the needed dependencies. Another solution is to do as Victor said but eventually you'll be need lots of dependencies if you are building bigger projects and that's not the best solution you have. — eliasah
I don't think you need this: --driver-class-path ~/path/to/mysql-connector-java-5.1.37-bin.jar. I think your might be missing the part where you load the driver Class.forName("com.mysql.jdbc.Driver").newInstance? — marios
I am building with sbt assembly. Like I said in the question, I have a plugins file with the plugin stuff in and I have used this file to build a fat jar before. I completely agree it looks like the mysql driver not in jar but why not? It should be. If I try marios suggestion I get the same error. Anyway, this shouldn't be necessary as it's not necessary in the repl. I'll try Victors suggestion. — Dean

Dean Dean · Accepted Answer · 2015-10-29T11:56:00

Solved it. I was doing nothing wrong in the build file or scala file.

Turns out spark-submit only looks at --driver-class-path if it comes before the path to the executable. So to get it to work, instead of the spark-submit command above I used:

./bin/spark-submit --driver-class-path ~/path/to/mysql-connector-java-5.1.37-bin.jar ~/path/to/scala/project/target/scala-2.10/complete.jar

I suspect to scale this to a cluster, I'd have to add the mysql connector to each worker, but that is for another day.

Adding java classes to sbt assembly

1 Answers