0
votes

I tried setting up a Dataflow streaming job using the "Pub/Sub topic to BigQuery" template. My org has an image constraint policy in place. According to the documentation for image constraints (https://cloud.google.com/compute/docs/images/restricting-image-access#limitations), any image used by a GCP service should not be affected by these constraints. However the dataflow workers fail to launch, citing image constraints as a reason. What is the correct way to set image constraints in such a scenario?

This is what the error looked like -

   {
 insertId: "qnh47fd17tx"  
 labels: {
  dataflow.googleapis.com/job_id: "job_id"   
  dataflow.googleapis.com/job_name: "job_name"   
  dataflow.googleapis.com/region: "us-central1"   
 }
 logName: "projects/app/logs/dataflow.googleapis.com%2Fjob-message"  
 receiveTimestamp: ""  
 resource: {
  labels: {
   job_id: ""    
   job_name: ""    
   project_id: ""    
   region: "us-central1"    
   step_id: ""    
  }
  type: "dataflow_step"   
 }
 severity: "ERROR"  
 textPayload: "Workflow failed. Causes: Step "setup_resource_disks_harness50" failed., Step setup_resource_disks_harness50: Set up of resource disks_harness failed, Unable to create data disk(s)., Unknown error in operation 'operation-1600084247324-5af44a52c2574-7f195f5c-376e0b61': [CONDITION_NOT_MET] 'Constraint constraints/compute.trustedImageProjects violated for project getmega-app. Use of images from project dataflow-service-producer-prod is prohibited.'."  
 timestamp: ""  
}
1
In order to investigate further, can you paste the error showed in the logs here ? - Alexandre Moraes
Included the error log @AlexandreMoraes - Tapish Rathore

1 Answers

0
votes

Since your project is using Image Constraints, you also have a trusted image policy configure. So, only sourced from that project are allowed to start VM's accross your organisation.

However, services such as Google Cloud Dataflow and Datalab use images from other Google projects to create VMs within your VPC, which means that you may encounter an error when launching a Dataflow Templated Job. This can be easily overcome by adding a few projects to your trusted project images. As follows:

Using gcloud,

1 - Get the existing policy for you project

gcloud beta resource-manager org-policies describe \
    compute.trustedImageProjects --effective \
    --project [PROJECT_ID] > policy.yaml

2 - Open the policy.yaml file in a text editor. You should see a file as below:

constraint: constraints/compute.trustedImageProjects
listPolicy:
  allowedValues:
    - projects/debian-cloud
    - projects/cos-cloud
  deniedValues:
    - projects/unwanted-images

3 - Modify the compute.trustedImageProjects constraint by adding the following projects:

projects/cos-cloud
projects/dataflow-service-producer-prod
projects/serverless-vpc-access-images
projects/windows-cloud 

Notice that I have added all the projects that Google services may use to retrive/launch services. In your specific case, just adding projects/dataflow-service-producer-prod would be enough.

4 - Apply the policy.yaml file to your project.

gcloud beta resource-manager org-policies set-policy \
--project [PROJECT_ID] policy.yaml

After perfoming these actions, you will be able to launch you templated Dataflow Job. Lastly, you can use the Console to add the projects specified in the 3rd step, as described in the documentation.

Note: be careful when sharing your logs that may contain personal information such as Project id or Job id. These information should not be disclosed in public.