0
votes

I'm trying to load MySQL JDBC driver from a python app. I'm not invoking 'bin/pyspark' or 'spark-submit' program; instead I have a Python script in which I'm initializing 'SparkContext' and 'SparkSession' objects. I understand that we can pass '--jars' option when invoking 'pyspark', but how do I load and specify jdbc driver in my python app?

2
why don't you use pymysql? This is standard way to connect from python and can be easily installed using pip. pymysql.readthedocs.io/en/latest - sun_dare
Thanks. Reason is I'm using a design in which connections to all DBs (that can connect via JDBC) are through 'jaydebeapi' - codebee
And in this case I need to write my DataFrame to MySQL for which I need to connect via Spark. - codebee
Did you try this? providing JDBC path in the connect? conn = jdbc.connect(jdbc_class, [url, user, pw], jdbc_path) - sun_dare
I'm trying to use spark's DataFrameWriter that doesn't take jar file as an option. - codebee

2 Answers

1
votes

I think you want do something like this

from pyspark.sql import SparkSession

# Creates spark session with JDBC JAR
spark = SparkSession.builder \
    .appName('stack_overflow') \
    .config('spark.jars', '/path/to/mysql/jdbc/connector') \
    .getOrCreate()

# Creates your DataFrame with spark session with JDBC
df = spark.createDataFrame([
    (1, 'Hello'),
    (2, 'World!')
], ['Index', 'Value'])

df.write.jdbc('jdbc:mysql://host:3306/my_db', 'my_table',
              mode='overwrite',
              properties={'user': 'db_user', 'password': 'db_pass'})
0
votes

Answer is to create SparkContext like this:

spark_conf = SparkConf().set("spark.jars",  "/my/path/mysql_jdbc_driver.jar")
sc = SparkContext(conf=spark_conf)

This will load mysql driver into classpath.