32
votes

So I've been using sbt with assembly to package all my dependencies into a single jar for my spark jobs. I've got several jobs where I was using c3p0 to setup connection pool information, broadcast that out, and then use foreachPartition on the RDD to then grab a connection, and insert the data into the database. In my sbt build script, I include

"mysql" % "mysql-connector-java" % "5.1.33"

This makes sure the JDBC connector is packaged up with the job. Everything works great.

So recently I started playing around with SparkSQL and realized it's much easier to simply take a dataframe and save it to a jdbc source with the new features in 1.3.0

I'm getting the following exception :

java.sql.SQLException: No suitable driver found for jdbc:mysql://some.domain.com/myschema?user=user&password=password at java.sql.DriverManager.getConnection(DriverManager.java:596) at java.sql.DriverManager.getConnection(DriverManager.java:233)

When I was running this locally I got around it by setting

SPARK_CLASSPATH=/path/where/mysql-connector-is.jar

Ultimately what I'm wanting to know is, why is the job not capable of finding the driver when it should be packaged up with it? My other jobs never had this problem. From what I can tell both c3p0 and the dataframe code both make use of the java.sql.DriverManager (which handles importing everything for you from what I can tell) so it should work just fine?? If there is something that prevents the assembly method from working, what do I need to do to make this work?

10
How do you launch your jobs? Did you check that your assembled jar includes the MySQL driver?Daniel Darabos
I did check the jar, it does contain the MySQL driver. I launch my job using bin/spark-submit --class "com.mypackage.MyJob" --verbose spark://place.where.this.exists.com:7077 MyJob.jarAdam Ritter
I have the same problem, also trying to save to mysql. Did you ever get to the bottom of this?Marcin

10 Answers

36
votes

This person was having similar issue: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-DataFrame-with-MySQL-td22178.html

Have you updated your connector drivers to the most recent version? Also did you specify the driver class when you called load()?

Map<String, String> options = new HashMap<String, String>();
options.put("url", "jdbc:mysql://localhost:3306/video_rcmd?user=root&password=123456");
options.put("dbtable", "video");
options.put("driver", "com.mysql.cj.jdbc.Driver"); //here
DataFrame jdbcDF = sqlContext.load("jdbc", options); 

In spark/conf/spark-defaults.conf, you can also set spark.driver.extraClassPath and spark.executor.extraClassPath to the path of your MySql driver .jar

20
votes

These options are clearly mentioned in spark docs: --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar

The mistake I was doing was mentioning these options after my application's jar.

However the correct way is to specify these options immediately after spark-submit:

spark-submit --driver-class-path /somepath/project/mysql-connector-java-5.1.30-bin.jar --jars /somepath/project/mysql-connector-java-5.1.30-bin.jar --class com.package.MyClass target/scala-2.11/project_2.11-1.0.jar

18
votes

Both spark driver and executor need mysql driver on class path so specify

spark.driver.extraClassPath = <path>/mysql-connector-java-5.1.36.jar
spark.executor.extraClassPath = <path>/mysql-connector-java-5.1.36.jar
10
votes

With spark 2.2.0, problem was corrected for me by adding extra class path information for SparkSession session in python script :

    spark = SparkSession \
        .builder \
        .appName("Python Spark SQL basic example") \
        .config("spark.driver.extraClassPath", "/path/to/jdbc/driver/postgresql-42.1.4.jar") \
        .getOrCreate()

See official documentation https://spark.apache.org/docs/latest/configuration.html

In my case, spark is not launched from cli command, but from django framework https://www.djangoproject.com/

6
votes

spark.driver.extraClassPath does not work in client-mode:

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-class-path command line option or in your default properties file.

Env variable SPARK_CLASSPATH has been deprecated in Spark 1.0+.

You should first copy the jdbc driver jars into each executor under the same local filesystem path and then use the following options in you spark-submit:

--driver-class-path "driver_local_file_system_jdbc_driver1.jar:driver_local_file_system_jdbc_driver2.jar"
--class "spark.executor.extraClassPath=executors_local_file_system_jdbc_driver1.jar:executors_local_file_system_jdbc_driver2.jar"

For example in case of TeraData you need both terajdbc4.jar and tdgssconfig.jar .

Alternatively modify compute_classpath.sh on all worker nodes, Spark documentation says:

The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. This is because Java’s DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs.

3
votes

There exists a simple Java trick to solve your problem. You should specify Class.forName() instance. For example:

 val customers: RDD[(Int, String)] = new JdbcRDD(sc, () => {
       Class.forName("com.mysql.jdbc.Driver")
       DriverManager.getConnection(jdbcUrl)
      },
      "SELECT id, name from customer WHERE ? < id and id <= ?" ,
      0, range, partitions, r => (r.getInt(1), r.getString(2)))

Check the docs

2
votes

Simple easy way is to copy "mysql-connector-java-5.1.47.jar" into "spark-2.4.3\jars\" directory

1
votes

I had the same problem running jobs over a Mesos cluster in cluster mode.

To use a JDBC driver is necessary to add the dependency to the system classpath not to the framework classpath. I only found the way of doing it by adding the dependency in the file spark-defaults.conf in every instance of the cluster.

The properties to add are spark.driver.extraClassPath and spark.executor.extraClassPath and the path must be in the local file system.

1
votes

I add the jar file to the SPARK_CLASSPATH in spark-env.sh, it works.

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/local/spark-1.6.3-bin-hadoop2.6/lib/mysql-connector-java-5.1.40-bin.jar
-1
votes

I was facing the same issue when I was trying to run the spark-shell command from my windows machine. The path that you pass for the driver location as well as for the jar that you would be using should be in the double quotes otherwise it gets misinterpreted and you would not get the exact output that you want.

you also would have to install the JDBC driver for SQL server from the link : JDBC Driver

I have used the below command for this to work fine for me on my windows machine:

spark-shell --driver-class-path "C:\Program Files\Microsoft JDBC Driver 6.0 for SQL Server\sqljdbc_6.0\enu\jre8\sqljdbc42.jar" --jars "C:\Program Files\Microsoft JDBC Driver 6.0 for SQL Server\sqljdbc_6.0\enu\jre8\sqljdbc42.jar"