0
votes

I am trying to add redshift jar using spark-submit option:

Running command on Spark 2.1.0

spark-submit --class Test --master spark://xyz.local:7077 --executor-cores 4 --total-executor-cores 32 --executor-memory 6G --driver-memory 4G --driver-cores 2 --deploy-mode cluster -jars s3a://d11-batch-jobs-on-spark/jars/redshift-jdbc42-1.2.10.1009.jar,s3a://mybucket/jars/spark-redshift_2.11-3.0.0-preview1.jar s3a://mybucket/jars/app.jar

and in code I am reading from redshift table but getting ClassNotFoundException: com.databricks.spark.redshift.DefaultSource

What am I doing wrong?

1

1 Answers

0
votes

I'm having issues using the --jars as well...

My advise is, for packages in the Maven repository, to use --packages instead of --jars, as it resolves other dependencies withing those packages.

USAGE

spark-submit --packages <groupId>:<artifactId>:<version>

In your case, except all other options and args, it'd look like this:

spark-submit --packages com.amazon.redshift:redshift-jdbc42:1.2.10.1009

You can find IDs and version from an XML-style provided by Maven after following the link to your desired version.

The accepted answer to this question provides more info on --jars and -packages