1
votes

os: centos

spark:1.6.1

sbt: build.sbt

libraryDependencies ++= {
Seq(
    "org.apache.spark" %% "spark-core" % "1.6.1" % "provided",
    "com.amazonaws" % "aws-java-sdk" % "1.10.75",
    "com.amazonaws" % "amazon-kinesis-client" % "1.1.0",
    "com.amazon.redshift" % "jdbc4" % "1.1.7.1007" % "test"
)
}
resolvers ++= Seq(
    "redshift" at "https://s3.amazonaws.com/redshift-downloads/drivers/RedshiftJDBC4-1.1.7.1007.jar"
         )

spark app:

val redshiftDriver = "com.amazon.redshift.jdbc4.Driver"
Class.forName(redshiftDriver)

I've specified the redshift driver, and updated to url etc., following AWS official documentation here: http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-in-code.html

But I'm still getting error below:

java.sql.SQLException: No suitable driver found for jdbc:redshift://xxx.us-west-2.redshift.amazonaws.com:5439

I googled and someone said the jar should be added to classpath? Could anyone please help here? Thank you very much

1
I received java.sql.SQLException: No suitable driver. The code works well from local but not from EMR. I had to add .option("driver","com.amazon.redshift.jdbc42.Driver") with connection string to make it run on EMR. - J. P

1 Answers

0
votes

Solved:

just clean all cached stuff, and re-build everything from scratch, and then it's working

Add on:

Databricks implemented this lib, which could make our life much easier interacting redshift within Spark https://github.com/databricks/spark-redshift

// Get some data from a Redshift table
val df: DataFrame = sqlContext.read
    .format("com.databricks.spark.redshift")
    .option("url", "jdbc:redshift://redshifthost:5439/database?user=username&password=pass")
    .option("dbtable", "my_table")
    .option("tempdir", "s3n://path/for/temp/data")
    .load()