I'm having a problem building a spark jar with scala. It's a really simple thing, I want to programatically access a mysql server via JDBC and load it in to a spark data frame. I can get this to work in the spark shell but I cannot package a jar that works with spark submit. It will package but when running, fails with
Exception in thread "main" java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3310/100million
My spark-submit command is
./bin/spark-submit ~/path/to/scala/project/target/scala-2.10/complete.jar --driver-class-path ~/path/to/mysql-connector-java-5.1.37-bin.jar
My build.sbt looks like
name := "sql_querier"
version := "1.0"
scalaVersion := "2.10.4"
sbtVersion := "0.13.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.1" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.5.1" % "provided"
libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.37"
assemblyJarName in assembly := "complete.jar"
mainClass in assembly := Some("sql_querier")
offline := true
and my very simple code is
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
object sql_querier{
def main(args: Array[String]) {
val sc = new org.apache.spark.SparkContext()
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val url="jdbc:mysql://databaseurl:portno/database"
val prop = new java.util.Properties
prop.setProperty("user","myuser")
prop.setProperty("password","mydatabase")
val cats=sqlContext.read.jdbc(url, "categories", prop)
cats.show
}
}
Where I've hidden the real values for user password and database url. I've also got a file in projects that adds the sbt assembly plugin but there is nothing wrong with this. When starting a spark shell with the --driver-class-path option pointing to the mysql jar, I can run the commands and extract data from the mysql database.
Any clue what I am doing wrong with the build would be greatly appreciated.
Dean
EDIT: Tried a whole bunch of things including different versions of the jdbc driver and adding the lines
sc.addJar("/Users/dean.wood/data_science/scala/sqlconn/mysql-connector-java-5.0.8-bin.jar")
Class.forName("com.mysql.jdbc.Driver")
to the scala file and the lines
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs@_*) =>
xs.map(_.toLowerCase) match {
case ("manifest.mf" :: Nil) |
("index.list" :: Nil) |
("dependencies" :: Nil) |
("license" :: Nil) |
("notice" :: Nil) => MergeStrategy.discard
case _ => MergeStrategy.first // was 'discard' previousely
}
case "reference.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
to the build file.
Nothing seems to help.
spark-submit ... --packages="mysql:mysql-connector-java:5.1.37"- Victor Moroz--driver-class-path ~/path/to/mysql-connector-java-5.1.37-bin.jar. I think your might be missing the part where you load the driverClass.forName("com.mysql.jdbc.Driver").newInstance? - marios