
I want to explore my data in Redshift using notebook Zeppelin. A small EMR cluster with Spark is running behind. I am loading databricks' spark-redshift library


and then

import org.apache.spark.sql.DataFrame

val query = "..."

val url = "..."
val port=5439
val table = "..."
val database = "..."
val user = "..."
val password = "..."

val df: DataFrame = sqlContext.read
  .option("url", s"jdbc:redshift://${url}:$port/$database?user=$user&password=$password")
  .option("tempdir", "s3n://.../tmp/data")


but I get the error

java.lang.ClassNotFoundException: Could not load an Amazon Redshift JDBC driver; see the README for instructions on downloading and configuring the official Amazon driver

I added option

option("jdbcdriver", "com.amazon.redshift.jdbc41.Driver")

but not for the better. I think I need to specify redshift's JDBC driver somewhere like I would passing --driver-class-path to spark-shell, but how to do that with zeppelin?

Given the error message, it looks like you need to do some extra configuration. Have you actually followed the instructions from that README? This is not a generic error message, but one produced by the redshift JDBC driver; so you have it installed, but it is just missing some configuration (or additional libraries?).Mark Rotteveel

1 Answers


You can add external jars with dependencies like the JDBC driver using either Zeppelin's dependency-loading mechanism or, in case of Spark, using %dep dynamic dependency loader

When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using %dep interpreter.

  • Load libraries recursively from Maven repository
  • Load libraries from local filesystem
  • Add additional maven repository
  • Automatically add libraries to SparkCluster (You can turn off)

The latter would look something like:

// loads with all transitive dependencies from Maven repo

// or add artifact from filesystem

and by convention have to be in the first paragraph of the note.