1
votes

I created a cluster on Dataproc and it works great. However, after the cluster is idle for a while (~90 min), the master node will automatically stops. This happens to every cluster I created. I see there is a similar question here: Keep running Dataproc Master node

It looks like it's the initialization action problem. However the post does not give me enough info to fix the issue. Below are the commands I used to create the cluster:

gcloud dataproc clusters create $CLUSTER_NAME \
    --project $PROJECT \
    --bucket $BUCKET \
    --region $REGION \
    --zone $ZONE \
    --master-machine-type $MASTER_MACHINE_TYPE \
    --master-boot-disk-size $MASTER_DISK_SIZE \
    --worker-boot-disk-size $WORKER_DISK_SIZE \
    --num-workers=$NUM_WORKERS \
    --initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh,gs://dataproc-initialization-actions/datalab/datalab.sh \
    --metadata gcs-connector-version=$GCS_CONNECTOR_VERSION \
    --metadata bigquery-connector-version=$BQ_CONNECTOR_VERSION \
    --scopes cloud-platform \
    --metadata JUPYTER_CONDA_PACKAGES=numpy:scipy:pandas:scikit-learn \
    --optional-components=ANACONDA,JUPYTER \
    --image-version=1.3

I need the BigQuery connector, GCS connector, Jupyter and DataLab for my cluster.

How can I keep my master node running? Thank you.

1
Does your project have any shared GCE startup scripts that might have some kind of auto-shutdown on idle logic? Are there any other GCE VMs in your project that are running any kind of reaper to shutdown other VMs that they think are idle? If you go to Stackdriver audit logs under "Compute Engine" and "activity" logs you should see records of what issued the shutdown commands; is it the compute engine default service account which issued those commands? - Dennis Huo
Yes, it's the default service account. (which listed under authenticationInfo of the log). However, there are other instances running on the same project, having the same service account, and is not shutting down automatically when idle. In addition, only the master node is stopped automatically, the worker nodes are running all the time.. - user2830451
I just tested initiating a cluster without any initialization-actions, under same project with same service account. It turns out that the cluster is up-running without automatically stops... - user2830451
Probably due to Datalab's auto shutdown (default is 90min) cloud.google.com/datalab/docs/concepts/auto-shutdown - Guillem Xercavins
Datalab uses the env var DATALAB_DISABLE_IDLE_TIMEOUT_PROCESS to control auto-shutdown. Can you try running the Docker container with -e "DATALAB_DISABLE_IDLE_TIMEOUT_PROCESS=true"? - yelsayed

1 Answers

2
votes

As summarized in the comment thread, this is indeed caused by Datalab's auto-shutdown feature. There are a couple ways to change this behavior:

  1. Upon first creating the Datalab-enabled Dataproc cluster, log in to Datalab and click on the "Idle timeout in about ..." text to disable it: https://cloud.google.com/datalab/docs/concepts/auto-shutdown#disabling_the_auto_shutdown_timer - The text will change to "Idle timeout is disabled"
  2. Edit the initialization action to set the environment variable as suggested by yelsayed:

    function run_datalab(){
      if docker run -d --restart always --net=host -e "DATALAB_DISABLE_IDLE_TIMEOUT_PROCESS=true" \
          -v "${DATALAB_DIR}:/content/datalab" ${VOLUME_FLAGS} datalab-pyspark; then
        echo 'Cloud Datalab Jupyter server successfully deployed.'
      else
        err 'Failed to run Cloud Datalab'
      fi
    }
    

And use your custom initialization action instead of the stock gs://dataproc-initialization-actions one. It could be worth filing a tracking issue in the github repo for dataproc initialization actions too, suggesting to disable the timeout by default or provide an easy metadata-based option. It's probably true that the auto-shutdown behavior isn't as expected in default usage on a Dataproc cluster since the master is also performing roles other than running the Datalab service.