2
votes

I am using Spark to do some computation over some data and then push to Hive. The Cloud Dataproc versions is 1.2 with Hive 2.1 included. The Merge command in Hive is only support by version 2.2 onwards. So I have to use preview version for dataproc cluster. When I use version 1.2 for dataproc cluster, I can create the cluster without any issue. I got this error "Failed to bring up Cloud SQL Metastore" when using preview version. The initialisation script is here. Has anyone every met this problem before?

hive-metastore.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install is-enabled hive-metastore
mysql.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable mysql
insserv: warning: current start runlevel(s) (empty) of script `mysql` overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `mysql' overrides LSB defaults (0 1 6).
Created symlink /etc/systemd/system/multi-user.target.wants/cloud-sql-proxy.service → /usr/lib/systemd/system/cloud-sql-proxy.service.
Cloud SQL Proxy installation succeeded
hive-metastore.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install is-enabled hive-metastore
[2018-06-06T12:43:55+0000]: Failed to bring up Cloud SQL Metastore
1
A couple of diagnostics would be helpful. The last line seems to indicate the command hive -e 'show tables;' failed. Can you SSH into a master node and execute that command and see if there's any interesting output? Further, journalctl -u cloud-sql-proxy.service will show logs for the Cloud SQL proxy. That may also contain useful information.Angus Davis
Hi, Angus Davis. The answer given by @tix helped me solve the problem. Thank you for helping.Y.Su

1 Answers

1
votes

I believe the issue may be that your metastore was initialized from an older version of Dataproc and thus has outdated schema.

If you have the failed cluster (if not, please create a new one as before, you can use --single-node option to reduce cost), then SSH to master node and upgrade schema:

$ gcloud compute ssh my-cluster-m

$ /usr/lib/hive/bin/schematool -dbType mysql -info
Hive distribution version:       2.3.0
Metastore schema version:        2.1.0    <-- you will need this

org.apache.hadoop.hive.metastore.HiveMetaException: Metastore schema version is
not compatible. Hive Version: 2.3.0, Database Schema Version: 2.1.0
*** schemaTool failed ***

$ /usr/lib/hive/bin/schematool -dbType mysql -upgradeSchemaFrom 2.1.0

Unfortunately this cluster cannot be returned to running state, so please delete and recreate it.

I have created this PR to make issue more discoverable: https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/pull/278