1
votes

Apache Beam Python SDK upgrade to 2.11.0 issue .

I am upgrading the sdk from 2.4.0 to 2.11.0 using requirements.txt. It has dependencies as below:

    apache_beam==2.11.0
    google-cloud-dataflow==2.4.0
    httplib2==0.11.3
    google-cloud==0.27.0
    google-cloud-storage==1.3.0
    workflow

For managing the dependencies in beam pipeline we have this txt file. There are two vm instance on google compute engine , one is master other is worker. These instances will install all packages listed in the requirements.txt file.

The jobs are run through DataflowRunner. If running the code manually using command as

python code.py --project --setupFilePath --requirementFilePath --workerMachineType n1-standard-8 --runner DataflowRunner.

The job is not upgrading the version to 2.11.0 , rather it fails .Error Message in stackdriver logs:

2019-03-26 19:02:02.000 IST
Failed to install packages: failed to install requirements: exit status 1
Expand all | Collapse all {
 insertId:  "27857323862365974846:1225647:0:438995"  
 jsonPayload: {
  line:  "boot.go:144"   
  message:  "Failed to install packages: failed to install requirements: exit status 1"   
 }
 labels: {
  compute.googleapis.com/resource_id:  "278567544395974846"   
  compute.googleapis.com/resource_name:  "icf-20190334132038-03260625-b9fa-harness-gtml"   
  compute.googleapis.com/resource_type:  "instance"   
  dataflow.googleapis.com/job_id:  "2019-03-26_06_25_16-6068768320191854196"   
  dataflow.googleapis.com/job_name:  "icf-20190326132038"   
  dataflow.googleapis.com/region:  "global"   
 }
 logName:  "projects/project-id/logs/dataflow.googleapis.com%2Fworker-startup"  
 receiveTimestamp:  "2019-03-26T13:32:07.627920858Z"  
 resource: {
  labels: {
   job_id:  "2019-03-26_06_25_16-6068768320191854196"    
   job_name:  "icf-20190326132038"    
   project_id:  "project-id"    
   region:  "global"    
   step_id:  ""    
  }
  type:  "dataflow_step"   
 }
 severity:  "CRITICAL"  
 timestamp:  "2019-03-26T13:32:02Z"  
}

Note : When running the pip install apache-beam==2.11.0 on both worker and master , the code runs.*

1
The issue appearing with beam2.11.0 , 2.10,2.9 version. I downgraded the version to 2.8.0 and it worked using airflow 1.8.1Neha0908

1 Answers

0
votes

I am not sure but most likely the issue here without seeing the rest of the logs. Is an incompatible dependency. Are you able to run the pipeline locally and see if you have any dep issues?