BACKGROUND I am trying to run a spark-submit command that streams from Kafka and performs a JDBC sink into a postgres DB in AWS EMR (version 5.23.0) and using scala (version 2.11.12). The errors I see are
INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 6) on <master-public-dns-name>, executor 1: java.sql.SQLException (No suitable driver found for jdbc:postgres://...
ERROR WriteToDataSourceV2Exec: Data source writer org.apache.spark.sql.execution.streaming.sources.MicroBatchWriter@44dd5258 is aborting.
19/06/20 06:11:26 ERROR WriteToDataSourceV2Exec: Data source writer org.apache.spark.sql.execution.streaming.sources.MicroBatchWriter@44dd5258 aborted.
HYPOTHESIS PROBLEM I think the error is telling me that the jdbc postgres driver cannot be found on the executors, which is why it cannot sink to postgres.
PREVIOUS ATTEMPTS I have already done the following:
- Identified my driver in my structured streaming job as
Class.forName("org.postgresql.Driver")
- added
--jars postgresql-42.1.4.jar \
to my spark-submit job in order to send the jars to the driver and executors. In this attempt, this postgres driver jar exists in my local /home/user_name/ directory - Also tried
--jars /usr/lib/spark/jars/postgresql-42.1.4.jar \
to my spark-submit job, which is the location that spark in emr finds all the jars for execution - started my spark-submit job with
spark-submit --driver-class-path /usr/lib/spark/jars/postgresql-42.1.4.jar:....
- added the
/usr/lib/spark/jars/postgresql-42.1.4.jar
to the spark.driver.extraClassPath, spark.executor.extraClassPath, spark.yarn.dist.jars, spark.driver.extraLibraryPath, spark.yarn.secondary.jars, java.library.path, and to the System Classpath in general - My jdbc connection, while working in Zeppelin, does not work in spark-submit. It is
jdbc:postgres://master-public-dns-name:5432/DBNAME"
EXPECTED RESULT: I expect my executors to recognize the postgres driver and sink the data to the postgres DB.
PREVIOUS ATTEMPTS: I've already used the following suggestions to no avail:
Adding JDBC driver to Spark on EMR
No Suitable Driver found Postgres JDBC
No suitable driver found for jdbc:postgresql://192.168.1.8:5432/NexentaSearch