1
votes

I have a problem connecting to my postgresql db in the Spark application that is launching on a cluster of Bluemix Apache-Spark service by using spark-submit.sh script

My code for scala file is

val conf = new SparkConf().setAppName("My demo").setMaster("local")
 val sc = new SparkContext(conf)
 val sqlContext = new SQLContext(sc)
 val driver = "org.postgresql.Driver"
 val url = "jdbc:postgresql://aws-us-east-1-portal.16.dblayer.com:10394/tennisdb?user=***&password=***"
 println("create")
 try {
   Class.forName(driver)
   val jdbcDF = sqlContext.read.format("jdbc").options(Map("url" -> url, "driver" -> driver, "dbtable" -> "inputdata")).load()
   jdbcDF.show()
   println("success")
 } catch {
   case e : Throwable => {
     println(e.toString())
     println("Exception");
     }
 }
 sc.stop()

I'm using sbt file for resolving the dependencies. The code for sbt file is:

 name := "spark-sample"

 version := "1.0"

 scalaVersion := "2.10.4"

 // Adding spark modules dependencies

 val sparkModules = List("spark-core",
   "spark-streaming",
   "spark-sql",
   "spark-hive",
   "spark-mllib",
   "spark-repl",
   "spark-graphx"
 )

 val sparkDeps = sparkModules.map( module => "org.apache.spark" % s"${module}_2.10" % "1.4.0" )     

 libraryDependencies ++= sparkDeps

 libraryDependencies += "org.postgresql" % "postgresql" % "9.4-1201-jdbc41"

Then I use sbt package command for creating a jar for my application to run it on a cluster using Bluemix Apache-Spark service. The jar is created successfully for me and the application runs locally without any errors. But when I submit the application to Bluemix Apache-Spark service using spark-submit.sh script I get ClassNotFoundException for org.postgresql.Driver

3

3 Answers

1
votes

One of the other way easy way to do this:- Just put all the library files under the directory where your application jar is and tell spark-submit.sh to look for it.

charles@localhost tweetoneanalyzer]$ spark-submit --jars $(echo application/*.jar | tr ' ' ',') --class "SparkTweets" --master local[3] application/spark-sample.jar

In above example, spark-submit will upload all the jars indicated by --jars flag under application folder to server so you should put any library jars that you would use , in your case(postgresql-9.1-901-1.jdbc4.jar) and specify your application jar to be ran in the later argument application/spark-sample.jar

Thanks,

Charles.

1
votes

You should be using sbt assembly for creating the jar file to run it on cluster .

sbt assembly will create a fat JAR of your project with all of its dependencies which will include postgres too.

It's a CLASSPATH issue; the PostgreSQL JDBC driver isn't available when the class loader tries to load it.

In local it works bcoz postgres jar in there in classpath.

1
votes

Create your assembly jar file using command

   sbt assembly

make sure assembly file contain postgresql driver, if is not contain Put your postgresql-xxxx.jdbc4.jar driver to lib directory of your project

  /myproject/lib/postgresql-9.1-901-1.jdbc4.jar

and create again

   sbt assembly

upload your jar file in hdfs location

 hdfs://assembly/myproject.jar

if you are using spark submit use this command

./bin/spark-submit \
--class <main-class>
--master <master-url> \
hdfs://assembly/myproject.jar \

else configure your spark conf in your code

val conf = new SparkConf()
.setMaster(sparkMasterUrl
.setJars(Array("hdfs://assembly/myproject.jar"))

and run your application

In your case add assembly file like conf.setJars(Array("hdfs://assembly/myproject.jar"))

val conf = new SparkConf().setAppName("My demo").setMaster("local")
conf.setJars(Array("hdfs://assembly/myproject.jar"))
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
................................