How can I upgrade Apache Hive to version 3 on GCP Apache Spark Dataproc Cluster

Question

For one reason or another, I want to upgrade the version of Apache Hive from 2.3.4 to 3 on Google Cloud Dataproc(1.4.3) Spark Cluster. How can I upgrade the version of Hive but also maintain compatibility with the Cloud Dataproc tooling?

Dennis Huo Dennis Huo · Accepted Answer · 2019-05-08T19:23:44

Unfortunately there's no real way to guarantee compatibility with such customizations, and there are known incompatibilities with currently released spark versions being able to talk to Hive 3.x so you'll likely run into problems unless you've managed to cross-compile all the versions you need yourself.

In any case though, the easiest way to go about it if you're only trying to get limited subsets of functionality working is simply dumping your custom jarfiles into:

/usr/lib/hive/lib/

on all your nodes via an init action. You may need to reboot your master node after doing so to update Hive metastore and Hiveserver2, or at least running:

sudo systemctl restart hive-metastore
sudo systemctl restart hive-server2

on your master node.

For Spark issues you may need your custom build of Spark as well and replace the jarfiles under:

/usr/lib/spark/jars/

How can I upgrade Apache Hive to version 3 on GCP Apache Spark Dataproc Cluster

1 Answers