1
votes

I am creating dataproc hive cluster with the following command.

gcloud dataproc clusters create hive-cluster \
    --scopes sql-admin \
    --image-version 1.3 \
    --master-boot-disk-size 15 \
    --num-workers 0 \
    --initialization-actions gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh\
    --properties hive:hive.metastore.warehouse.dir=gs://project-warehouse/datasets \
    --metadata "hive-metastore-instance=$PROJECT:$REGION:hive-metastore"\
    --initialization-action-timeout 30m

But initialization script fails with the error of "ERROR 2003 (HY000): Can't connect to MySQL server on 'localhost' (111)"

When I ssh into the cluster I am able to connect to mysql thorough the command "mysql -h localhost -u root"

I followed this article : https://cloud.google.com/solutions/using-apache-hive-on-cloud-dataproc

I also given permissions as mentioned in this question.Link

2

2 Answers

1
votes

We suspect the problem was that systemctl start cloud-sql-proxy might return asynchronously when the proxy server is not yet ready.

Confirmed fix in this PR waits until the proxy server is ready: https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/pull/356

0
votes

I had the same issue just recently. It seems like the script is failing to start the proxy before using the connection. I've also tested on Dataproc image 1.2 and the same issue occurred.

thanks