0
votes

I am able to create a google dataproc cluster from the command line using a custom image:

gcloud beta dataproc clusters create cluster-name --image=custom-image-name

as specified in https://cloud.google.com/dataproc/docs/guides/dataproc-images, but I am unable to find information about how to do the same using the v1beta2 REST api in order to create a cluster from within airflow. Any help would be greatly appreciated.

1
Hi Georges, you can take a look at this url: cloud.google.com/dataproc/docs/reference/rest/v1beta2/…Hackerman
Looks like that interface does not know the "image" parameter (yet)?Georges Kohnen
In the request body, you can build something like { "clusterName": "", "config": { "softwareConfig": { "imageVersion": "" } } }...it seems that imageVersion is the right one.Hackerman
I think 'imageVersion' actually refers to the Dataproc version (cloud.google.com/dataproc/docs/concepts/versioning/…), not a custom image (which is a beta feature)Georges Kohnen
You're looking for "imageUri" on masterConfig and workerConfig objects: cloud.google.com/dataproc/docs/reference/rest/v1beta2/…tix

1 Answers

1
votes

Since custom images can theoretically reside in a different project if you grant read/use access of that custom image to whatever project service account you use for the Dataproc cluster, images currently always need a full URI, not just a short name.

When you use gcloud, there's syntactic sugar where gcloud will resolve the full URI automatically; you can see this in action if you use --log-http with your gcloud command:

gcloud beta dataproc clusters create foo --image=custom-image-name --log-http

If you created one with gcloud you can also gcloud dataproc clusters describe your cluster to see the fully-resolved custom image URI.