Dataproc cluster image upgrade

Question

Due to our business requirements we are bound to use static long running persistent Dataproc clusters. Is there any way to upgrade the Dataproc image to leverage latest OS/OSS updates?

Please help me with some reference documentation to carry out this operation (preferably automation).

What drives the requirement for persistent clusters? Is it the volume of jobs submitted? Do you store data on the cluster in HDFS? — tix
In your other question you reference "1-namenode and n-number of datanodes" which suggests it is indeed data in HDFS. Any reason this is required, as opposed to storing final job output in GCS or BigQuery? — tix
Actually the huge volume of jobs submitted throughout the day is the reason driving the requirement of persistent cluster. We keep receiving files throughout the day and based on the business logic jobs are kicked-off on regular intervals. — Balajee Venkatesh
"1-namenode & n-datanode" highlights our on-prem cluster config. We haven't migrated to Dataproc yet. I just wanted to check for some references which would help us plan the equivalent Dataproc configuration. — Balajee Venkatesh

tix tix · Accepted Answer · 2019-12-03T16:23:36

In-place cluster upgrade is not something supported by Dataproc today, and is the reason we advise customers to instead use ephemeral (per job/workflow) or short lived clusters (on the order of weeks, not years).

Unfortunately, Oozie does not play well with cloud-native or -hybrid architectures. I would suggest building cluster-failover capabilities into your automation so you can delete/recreate every so often. Perhaps as part of cluster startup, it can emit a lock file that will prevent old cluster from spawning new jobs?

Here's additional references that may help.

On decoupling compute and storage:

https://www.qubole.com/blog/advantage-decoupling/

https://cloud.google.com/blog/products/storage-data-transfer/hdfs-vs-cloud-storage-pros-cons-and-migration-tips

Options for long-lived clusters:

https://cloud.google.com/blog/products/data-analytics/10-tips-for-building-long-running-clusters-using-cloud-dataproc

See my second answer below for one way to deal with Oozie specifically.

Dataproc cluster image upgrade

3 Answers