how to access Dataproc cluster metadata?

Question

After the creation of a cluster, I'm trying to retrieve the URL address of my additional components (without using the GCP Dashboard). I am using de Dataproc python API and more specifically the get_cluster() function.

A lot of data is returned by the function but I cannot manage to find the Jupyter gateway URL or other metadata.

from google.cloud import dataproc_v1

project_id, cluster_name = '', ''
region = 'europe-west4'

client = dataproc_v1.ClusterControllerClient(
                       client_options={
                            'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
                        }
                    )


response = client.get_cluster(project_id, region, cluster_name)
print(response)

Does anyone as a solution to this?

Guillem Xercavins Guillem Xercavins · Accepted Answer · 2020-01-06T15:23:32

If you have followed this doc to setup Jupyter access by enabling Component Gateway, then you can access the Web Interfaces as described here. The trick is that this is included in the API response for the v1beta2 version.

Changes needed in the code are minimal (no additional requirements apart from google-cloud-dataproc library). Just replace dataproc_v1 for dataproc_v1beta2 and access the endpoints with response.config.endpoint_config:

from google.cloud import dataproc_v1beta2

project_id, cluster_name = '', ''
region = 'europe-west4'

client = dataproc_v1beta2.ClusterControllerClient(
                       client_options={
                            'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
                        }
                    )


response = client.get_cluster(project_id, region, cluster_name)
print(response.config.endpoint_config)

In my case I get:

http_ports {
  key: "HDFS NameNode"
  value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/hdfs/dfshealth.html"
}
http_ports {
  key: "Jupyter"
  value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jupyter/"
}
http_ports {
  key: "JupyterLab"
  value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jupyter/lab/"
}
http_ports {
  key: "MapReduce Job History"
  value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jobhistory/"
}
http_ports {
  key: "Spark History Server"
  value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/sparkhistory/"
}
http_ports {
  key: "Tez"
  value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/apphistory/tez-ui/"
}
http_ports {
  key: "YARN Application Timeline"
  value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/apphistory/"
}
http_ports {
  key: "YARN ResourceManager"
  value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/yarn/"
}
enable_http_port_access: true

how to access Dataproc cluster metadata?

2 Answers