1
votes

I am trying to access Cloud SQL from Dataproc via Cloud SQL Proxy (without using Hive)

After much tinkering based on instructions here: https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/tree/master/cloud-sql-proxy

I got to the point where at least the cluster gets created with no errors and the proxy seems to be installed. However, my Java Spark jobs can't connect to the cluster with this error:

Exception in thread "main" java.sql.SQLException: Access denied for user 'root'@'localhost' (using password: NO)

I deliberately created an instance with NO user password, but it doesn't work for instances with the password either.

What I find strange is that when I access the same database from my local computer, also using a locally running Cloud SQL Proxy, everything works well, but when I try to force a similar error by deliberately submitting the wrong password, I get a similar, but DIFFERENT error, like this:

Exception in thread "main" java.sql.SQLException: Access denied for user 'root'@'cloudsqlproxy~217.138.38.242' (using password: YES)

So, in the Dataproc error, it says root@localhost, whereas in my local proxy the error says root@cloudproxy~IP address. Why is it doing this? It's exactly the same code running in both places. It seems like it's trying to connect to something local within the Dataproc master machine?

What further confirms this is that I don't see this error logged on the server side when the attempt fails on Dataproc, but the error IS logged when I force the failure from local machine. So the Dataproc's proxy doesn't seem to be pointing at the SQL Server?

I created the cluster with the following instructions:

--scopes sql-admin \
--initialization-actions gs://bucket_name/cloud-sql-proxy.sh \
--metadata 'enable-cloud-sql-hive-metastore=false' \
--metadata 'additional-cloud-sql-instances=project_id:europe-west2:sql_instance_id' \

And the connection string that I specify inside the Spark code is like this:

jdbc:mysql://127.0.0.1:3306/database_name

Thanks for your help!

**** Update:

Based on the below suggestion, I modified my connection string to be as follows:

"jdbc:mysql://google/DATABASE_NAME?cloudSqlInstance=INSTANCE_NAME&socketFactory=com.google.cloud.sql.mysql.SocketFactory&useSSL=false&user=root"

However, in this case I get the following error:

Exception in thread "main" java.sql.SQLNonTransientConnectionException: Cannot connect to MySQL server on google:3,306.

Make sure that there is a MySQL server running on the machine/port you are trying to connect to and that the machine this software is running on is able to connect to this host/port (i.e. not firewalled). Also make sure that the server has not been started with the --skip-networking flag.


    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:108)
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:95)
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:87)
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:61)
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:71)
    at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:458)
    at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:230)
    at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:226)

How/where is it supposed to get the driver for 'google'? Also, note that it seems to mal-format the default port 3306 and shows it as 3,306? (I tried supplying the port explicitly, but that didnt' help...

2
hey VS_FF, did you manage to access Cloud SQL from Dataproc via Cloud SQL Proxy using Scala/Java? - mike
It was a while ago, but as far as I remember I gave up after much pain and switched away from using Cloud SQL for this task. - VS_FF

2 Answers

1
votes

I followed instructions in the tutorial you shared and both Cloud SQL instance and Dataproc Cluster were created. The validation process also was carried out:

$ gcloud dataproc jobs submit pyspark --cluster githubtest pyspark_metastore_test.py
Job [63d2e1ef8c9f45ae818c135c775dcf93] submitted.
Waiting for job output...
18/08/22 17:21:51 INFO org.spark_project.jetty.util.log: Logging initialized @3074ms
...
Successfully found table table_mdhw in Cloud SQL Hive metastore                 
18/08/22 17:22:53 INFO org.spark_project.jetty.server.AbstractConnector: Stopped Spark@5061d2ce{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
Job [63d2e1ef8c9f45ae818c135c775dcf93] finished successfully.

I only got the same error like yours when I put a different password for root. Could you update the root password and try again from the master the following command?

 mysql -u root -h 127.0.0.1 -p

In my environment, the command above connects successfully. If that works, please check this link for further steps to connect your Java application. Authentication and the connector mysql-connector-java are required as additional steps.

Hope it helps!

0
votes

I ran into the same issues, with the exact same symptoms (Access Denied on localhost instead of cloudsqlproxy~*, and google:3,306).

SSH-ing in and looking at /var/log/cloud-sql-proxy/cloud-sql-proxy.log, I saw that cloud-sql-proxy wasn't actually starting; port 3306 was apparently already in use for some reason. I added =tcp:3307 to the end of the instance connection name in additional-cloud-sql-instances, and I was up and running.

I never managed to get the SocketFactory URIs working. If changing the port doesn't work, others elsewhere have suggested using VPC.