4
votes

BACKGROUND I am trying to run a spark-submit command that streams from Kafka and performs a JDBC sink into a postgres DB in AWS EMR (version 5.23.0) and using scala (version 2.11.12). The errors I see are

INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 6) on <master-public-dns-name>, executor 1: java.sql.SQLException (No suitable driver found for jdbc:postgres://... 

ERROR WriteToDataSourceV2Exec: Data source writer org.apache.spark.sql.execution.streaming.sources.MicroBatchWriter@44dd5258 is aborting.
19/06/20 06:11:26 ERROR WriteToDataSourceV2Exec: Data source writer org.apache.spark.sql.execution.streaming.sources.MicroBatchWriter@44dd5258 aborted.

HYPOTHESIS PROBLEM I think the error is telling me that the jdbc postgres driver cannot be found on the executors, which is why it cannot sink to postgres.

PREVIOUS ATTEMPTS I have already done the following:

  1. Identified my driver in my structured streaming job as Class.forName("org.postgresql.Driver")
  2. added --jars postgresql-42.1.4.jar \ to my spark-submit job in order to send the jars to the driver and executors. In this attempt, this postgres driver jar exists in my local /home/user_name/ directory
  3. Also tried --jars /usr/lib/spark/jars/postgresql-42.1.4.jar \ to my spark-submit job, which is the location that spark in emr finds all the jars for execution
  4. started my spark-submit job with spark-submit --driver-class-path /usr/lib/spark/jars/postgresql-42.1.4.jar:....
  5. added the /usr/lib/spark/jars/postgresql-42.1.4.jar to the spark.driver.extraClassPath, spark.executor.extraClassPath, spark.yarn.dist.jars, spark.driver.extraLibraryPath, spark.yarn.secondary.jars, java.library.path, and to the System Classpath in general
  6. My jdbc connection, while working in Zeppelin, does not work in spark-submit. It is jdbc:postgres://master-public-dns-name:5432/DBNAME"

EXPECTED RESULT: I expect my executors to recognize the postgres driver and sink the data to the postgres DB.

PREVIOUS ATTEMPTS: I've already used the following suggestions to no avail:

Adding JDBC driver to Spark on EMR

No Suitable Driver found Postgres JDBC

No suitable driver found for jdbc:postgresql://192.168.1.8:5432/NexentaSearch

1

1 Answers

0
votes

use -- packages org.postgresql:postgresql:<VERSION>