1
votes

I'm encountering an issue when trying to run a simple BigQuery ETL pipeline with a flask app on Google App Engine in the flex environment.

It works when I run it locally, which I do by first starting it with flask run or gunicorn -b :$PORT main:app and then going to an endpoint in my browser, doing stuff on the page, and submitting a form. The POST handler for the page then invokes the Apache Beam pipeline. All of that works fine.

But when I deploy it with gcloud app deploy and try to access any endpoint I get a 502 error and the logs show the following:

2018-10-04 14:03:39 default[20181003t232620]  Traceback (most recent call last):    File "/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker      worker.init_process()    File "/env/local/lib/python2.7/site-packages/gunicorn/workers/base.py", line 129, in init_process      self.load_wsgi()    File "/env/local/lib/python2.7/site-packages/gunicorn/workers/base.py", line 138, in load_wsgi      self.wsgi = self.app.wsgi()    File "/env/local/lib/python2.7/site-packages/gunicorn/app/base.py", line 67, in wsgi      self.callable = self.load()    File "/env/local/lib/python2.7/site-packages/gunicorn/app/wsgiapp.py", line 52, in load      return self.load_wsgiapp()    File "/env/local/lib/python2.7/site-packages/gunicorn/app/wsgiapp.py", line 41, in load_wsgiapp      return util.import_app(self.app_uri)    File "/env/local/lib/python2.7/site-packages/gunicorn/util.py", line 350, in import_app      __import__(module)    File "/home/vmagent/app/main.py", line 15, in <module>      import rw_bigquery_etl    File "/home/vmagent/app/rw_bigquery_etl.py", line 9, in <module>      import apache_beam as beam    File "lib/apache_beam/__init__.py", line 88, in <module>      from apache_beam import coders    File "lib/apache_beam/coders/__init__.py", line 19, in <module>      from apache_beam.coders.coders import *    File "lib/apache_beam/coders/coders.py", line 30, in <module>      from apache_beam.coders import coder_impl  ImportError: lib/apache_beam/coders/coder_impl.so: invalid ELF header
2018-10-04 14:03:39 default[20181003t232620]  [2018-10-04 14:03:39 +0000] [8] [INFO] Worker exiting (pid: 8)
2018-10-04 14:03:39 default[20181003t232620]  [2018-10-04 14:03:39 +0000] [1] [INFO] Shutting down: Master
2018-10-04 14:03:39 default[20181003t232620]  [2018-10-04 14:03:39 +0000] [1] [INFO] Reason: Worker failed to boot.

With the actual error being from apache_beam.coders import coder_impl ImportError: lib/apache_beam/coders/coder_impl.so: invalid ELF header

I had lots of issues with dependencies recently, so I just ran pip freeze > requirements.txt in the project folder, giving me this (pastebin). I've installed this to a lib folder in the project folder and have the line vendor.add('lib') in appengine_config.py. Also, this is my app.yaml:

runtime: python
api_version: 1
threadsafe: true
env: flex
entrypoint: gunicorn -b :$PORT main:app

runtime_config:
  python_version: 2

handlers:
- url: /.*
  script: main.app
  login: required

How can I resolve this issue, or go about troubleshooting it?

I'm new to Google Cloud and pip, so I'm still trying to understand how the cloud environment works, especially with python packages.

1

1 Answers

1
votes

Consolidating python dependencies/requirements for apache beam is uniquely frustrating.

It would be helpful to see your

  1. pipeline config
  2. how you launch your pipeline locally
  3. how you launch your pipeline remotely (your request handler code that launches it)
  4. where your pipeline code sits relative to your project root

But it sounds like that requirements.txt you set up is the requirements.txt for your gae flex instance but not getting used for your dataflow worker. Possibly you supplied your requirements.txt as a commandline option when running locally and your server code is not supplying that same option.

Look at my answer here:

https://stackoverflow.com/a/51312281/4458510

I've had the best luck using a setup.py for my pipeline's dependencies, like they do in this example: https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/complete/juliaset