0
votes

I'm running Spark on a docker container (sequenceiq/spark).
I launched it like this:

docker run --link dbHost:dbHost  -v my/path/to/postgres/jar:postgres/ -it -h sandbox sequenceiq/spark:1.6.0 bash

I'm sure that the postgreSQL database is accessible through the address postgresql://user:password@localhost:5432/ticketapp.

I start the spark-shell with spark-shell --jars postgres/postgresql-9.4-1205.jdbc42.jar and since I can connect from my Play! application that has as dependency "org.postgresql" % "postgresql" % "9.4-1205-jdbc42" it seems that I have the correct jar. (I also don't any warning saying that the local jar does not exist.)

But when I try to connect to my database with:

val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql://dbHost:5432/ticketapp?user=user&password=password", 
    "dbtable" -> "events")
  ).load()

(I also tried the url jdbc:postgresql://user:root@dbHost:5432/ticketapp)

as it is explained in the spark documentation, I get this error: java.sql.SQLException: No suitable driver found for jdbc:postgresql://dbHost:5432/ticketapp?user=simon&password=root

What am I doing wrong?

1

1 Answers

1
votes

As far as I know you need to include the JDBC driver for you particular database on the spark classpath. According to documentation (https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases) it should be done like this:

SPARK_CLASSPATH=postgresql-9.3-1102-jdbc41.jar bin/spark-shell