
I have a problem connecting to my postgresql db in the Spark application that is launching on a cluster of Bluemix Apache-Spark service by using spark-submit.sh script

My code for scala file is

val conf = new SparkConf().setAppName("My demo").setMaster("local")
 val sc = new SparkContext(conf)
 val sqlContext = new SQLContext(sc)
 val driver = "org.postgresql.Driver"
 val url = "jdbc:postgresql://aws-us-east-1-portal.16.dblayer.com:10394/tennisdb?user=***&password=***"
 try {
   val jdbcDF = sqlContext.read.format("jdbc").options(Map("url" -> url, "driver" -> driver, "dbtable" -> "inputdata")).load()
 } catch {
   case e : Throwable => {

I'm using sbt file for resolving the dependencies. The code for sbt file is:

 name := "spark-sample"

 version := "1.0"

 scalaVersion := "2.10.4"

 // Adding spark modules dependencies

 val sparkModules = List("spark-core",

 val sparkDeps = sparkModules.map( module => "org.apache.spark" % s"${module}_2.10" % "1.4.0" )     

 libraryDependencies ++= sparkDeps

 libraryDependencies += "org.postgresql" % "postgresql" % "9.4-1201-jdbc41"

Then I use sbt package command for creating a jar for my application to run it on a cluster using Bluemix Apache-Spark service. The jar is created successfully for me and the application runs locally without any errors. But when I submit the application to Bluemix Apache-Spark service using spark-submit.sh script I get ClassNotFoundException for org.postgresql.Driver


One of the other way easy way to do this:- Just put all the library files under the directory where your application jar is and tell spark-submit.sh to look for it.

charles@localhost tweetoneanalyzer]$ spark-submit --jars $(echo application/*.jar | tr ' ' ',') --class "SparkTweets" --master local[3] application/spark-sample.jar

In above example, spark-submit will upload all the jars indicated by --jars flag under application folder to server so you should put any library jars that you would use , in your case(postgresql-9.1-901-1.jdbc4.jar) and specify your application jar to be ran in the later argument application/spark-sample.jar




You should be using sbt assembly for creating the jar file to run it on cluster .

sbt assembly will create a fat JAR of your project with all of its dependencies which will include postgres too.

It's a CLASSPATH issue; the PostgreSQL JDBC driver isn't available when the class loader tries to load it.

In local it works bcoz postgres jar in there in classpath.


Create your assembly jar file using command

   sbt assembly

make sure assembly file contain postgresql driver, if is not contain Put your postgresql-xxxx.jdbc4.jar driver to lib directory of your project


and create again

   sbt assembly

upload your jar file in hdfs location


if you are using spark submit use this command

./bin/spark-submit \
--class <main-class>
--master <master-url> \
hdfs://assembly/myproject.jar \

else configure your spark conf in your code

val conf = new SparkConf()

and run your application

In your case add assembly file like conf.setJars(Array("hdfs://assembly/myproject.jar"))

val conf = new SparkConf().setAppName("My demo").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)