1
votes

I am trying to connect to google cloud sql from a job running on google cloud dataproc. I have not authorized access to the cloud sql instance from external network. since my dataproc cluster is in the same project as cloud sql, I am expecting it to allow the connection.

I have followed the docs for connecting to cloud sql (https://cloud.google.com/appengine/docs/standard/java/cloud-sql/) . But this doc is for connecting from GAE to cloud sql. Nevertheless I tried the steps. But it seems like com.mysql.jdbc.GoogleDriver is not available in the data proc environment. Hence I get ClassNotFoundException for this class.

Where can I get this package. I will include it in the uber jar and try to run in dataproc cluster.

2
Are you already including the dependencies mysql-connector-java and mysql-socket-factory in your pom as described in the linked doc? - Axel Magnuson
I have not included mysql-connector-java because it says in doc that it is only used locally. - Golak Sarangi

2 Answers

0
votes

After doing some reading, it sounds like GoogleDriver is only available in the context of an AppEngine application. Outside of AppEngine, the usage pattern is a little different. From the first link:

String jdbcUrl = String.format(
    "jdbc:mysql://google/%s?cloudSqlInstance=%s&"
        + "socketFactory=com.google.cloud.sql.mysql.SocketFactory",
    databaseName,
    instanceConnectionName);

Connection connection = DriverManager.getConnection(jdbcUrl, username, password);
0
votes

To solve this problem, you need to add jar with JDBC driver to Spark Driver class-path: https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases

To do so you need to upload this jar to GCS, specify path to it in --jars argument (so Dataproc will distribute it to all nodes in cluster) and add it to Spark Driver class-path using --properties argument when submitting Spark job through Dataproc:

$ gcloud dataproc jobs submit spark ... \
    --jars=gs://<BUCKET>/<DIRECTORIES>/<JAR_NAME> \
    --properties=spark.driver.extraClassPath=<JAR_NAME>