I'm trying to load MySQL JDBC driver from a python app. I'm not invoking 'bin/pyspark' or 'spark-submit' program; instead I have a Python script in which I'm initializing 'SparkContext' and 'SparkSession' objects. I understand that we can pass '--jars' option when invoking 'pyspark', but how do I load and specify jdbc driver in my python app?
0
votes
why don't you use pymysql? This is standard way to connect from python and can be easily installed using pip. pymysql.readthedocs.io/en/latest
- sun_dare
Thanks. Reason is I'm using a design in which connections to all DBs (that can connect via JDBC) are through 'jaydebeapi'
- codebee
And in this case I need to write my DataFrame to MySQL for which I need to connect via Spark.
- codebee
Did you try this? providing JDBC path in the connect? conn = jdbc.connect(jdbc_class, [url, user, pw], jdbc_path)
- sun_dare
I'm trying to use spark's DataFrameWriter that doesn't take jar file as an option.
- codebee
2 Answers
1
votes
I think you want do something like this
from pyspark.sql import SparkSession
# Creates spark session with JDBC JAR
spark = SparkSession.builder \
.appName('stack_overflow') \
.config('spark.jars', '/path/to/mysql/jdbc/connector') \
.getOrCreate()
# Creates your DataFrame with spark session with JDBC
df = spark.createDataFrame([
(1, 'Hello'),
(2, 'World!')
], ['Index', 'Value'])
df.write.jdbc('jdbc:mysql://host:3306/my_db', 'my_table',
mode='overwrite',
properties={'user': 'db_user', 'password': 'db_pass'})