Load JDBC driver for Spark DataFrame 'write' using 'jdbc' in Python Script

Question

I'm trying to load MySQL JDBC driver from a python app. I'm not invoking 'bin/pyspark' or 'spark-submit' program; instead I have a Python script in which I'm initializing 'SparkContext' and 'SparkSession' objects. I understand that we can pass '--jars' option when invoking 'pyspark', but how do I load and specify jdbc driver in my python app?

why don't you use pymysql? This is standard way to connect from python and can be easily installed using pip. pymysql.readthedocs.io/en/latest — sun_dare
Thanks. Reason is I'm using a design in which connections to all DBs (that can connect via JDBC) are through 'jaydebeapi' — codebee
And in this case I need to write my DataFrame to MySQL for which I need to connect via Spark. — codebee
Did you try this? providing JDBC path in the connect? conn = jdbc.connect(jdbc_class, [url, user, pw], jdbc_path) — sun_dare
I'm trying to use spark's DataFrameWriter that doesn't take jar file as an option. — codebee

Kafels Kafels · Accepted Answer · 2019-06-03T22:08:15

I think you want do something like this

from pyspark.sql import SparkSession

# Creates spark session with JDBC JAR
spark = SparkSession.builder \
    .appName('stack_overflow') \
    .config('spark.jars', '/path/to/mysql/jdbc/connector') \
    .getOrCreate()

# Creates your DataFrame with spark session with JDBC
df = spark.createDataFrame([
    (1, 'Hello'),
    (2, 'World!')
], ['Index', 'Value'])

df.write.jdbc('jdbc:mysql://host:3306/my_db', 'my_table',
              mode='overwrite',
              properties={'user': 'db_user', 'password': 'db_pass'})

Load JDBC driver for Spark DataFrame 'write' using 'jdbc' in Python Script

2 Answers