I am trying to access Cloud SQL from Dataproc via Cloud SQL Proxy (without using Hive)
After much tinkering based on instructions here: https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/tree/master/cloud-sql-proxy
I got to the point where at least the cluster gets created with no errors and the proxy seems to be installed. However, my Java Spark jobs can't connect to the cluster with this error:
Exception in thread "main" java.sql.SQLException: Access denied for user 'root'@'localhost' (using password: NO)
I deliberately created an instance with NO user password, but it doesn't work for instances with the password either.
What I find strange is that when I access the same database from my local computer, also using a locally running Cloud SQL Proxy, everything works well, but when I try to force a similar error by deliberately submitting the wrong password, I get a similar, but DIFFERENT error, like this:
Exception in thread "main" java.sql.SQLException: Access denied for user 'root'@'cloudsqlproxy~217.138.38.242' (using password: YES)
So, in the Dataproc error, it says root@localhost, whereas in my local proxy the error says root@cloudproxy~IP address. Why is it doing this? It's exactly the same code running in both places. It seems like it's trying to connect to something local within the Dataproc master machine?
What further confirms this is that I don't see this error logged on the server side when the attempt fails on Dataproc, but the error IS logged when I force the failure from local machine. So the Dataproc's proxy doesn't seem to be pointing at the SQL Server?
I created the cluster with the following instructions:
--scopes sql-admin \
--initialization-actions gs://bucket_name/cloud-sql-proxy.sh \
--metadata 'enable-cloud-sql-hive-metastore=false' \
--metadata 'additional-cloud-sql-instances=project_id:europe-west2:sql_instance_id' \
And the connection string that I specify inside the Spark code is like this:
jdbc:mysql://127.0.0.1:3306/database_name
Thanks for your help!
**** Update:
Based on the below suggestion, I modified my connection string to be as follows:
"jdbc:mysql://google/DATABASE_NAME?cloudSqlInstance=INSTANCE_NAME&socketFactory=com.google.cloud.sql.mysql.SocketFactory&useSSL=false&user=root"
However, in this case I get the following error:
Exception in thread "main" java.sql.SQLNonTransientConnectionException: Cannot connect to MySQL server on google:3,306.
Make sure that there is a MySQL server running on the machine/port you are trying to connect to and that the machine this software is running on is able to connect to this host/port (i.e. not firewalled). Also make sure that the server has not been started with the --skip-networking flag.
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:108)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:95)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:87)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:61)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:71)
at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:458)
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:230)
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:226)
How/where is it supposed to get the driver for 'google'? Also, note that it seems to mal-format the default port 3306 and shows it as 3,306? (I tried supplying the port explicitly, but that didnt' help...