2
votes

I am struggling to perform some really simple task with Airflow.

For context, I use docker-compose to run docker containers with Airflow and Postgres. (https://github.com/puckel/docker-airflow)

I am trying to test the integration of one of our inhouse library with Airflow. The not very clean method I use to quickly test is to docker exec into the airflow container and pip install the appropriate library (that are shared through Host machine to container with a Docker volume in read-only mode).

Everything is installed properly with pip and I can use my library when running a dummy Python script.

However when I integrate the same logic in a DAG python file, I got the error "broken dag, no module named inhouse_lib.

At first I thought that Airflow was picking dependencies in a specific pip directory relative to the Python version and that I installed the library in another pip directory.

But for all by Python binaries, they all use Python 3.7.

For all pip binaries I have (pip, pip3, pip3.7) when doing a pip list, I can find my inhouse library.

I failed to understand how I am supposed to deploy my library so that Airflow can pick them up. Any insights would appreciated.

Thanks for your help.

Edit To clarify what I trying to do, below some details. In my DAG, I want to use a custom Python library (let's call it myLib feature that is not yet implemented. Once implemented, I want to deploy this latest version of myLib into in the airflow container.

I updated the docker-compose.yml with a volume that maps my host directory with myLib on container airflow home.

# Go in the container
docker exec -it <airflow docker container ID> bash

# Install myLib to Python environment
pip install myLib

# Check the installation
pip list | grep myLib # output myLib

# Check the import in Python REPL
python
import myLib # No Python error

The same import does not work in my Airflow DAG. When checking container logs, I have the following error:

[2019-08-30 15:14:30,499] {{__init__.py:51}} INFO - Using executor LocalExecutor
[2019-08-30 15:14:30,894] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2019-08-30 15:14:30,897] {{dagbag.py:205}} ERROR - Failed to import: /usr/local/airflow/dags/mydag.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 202, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/usr/local/lib/python3.7/imp.py", line 171, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 696, in _load
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/airflow/dags/mydag.py", line 7, in <module>
    import myLib
ModuleNotFoundError: No module named 'myLib'
[2019-08-30 15:14:31 +0000] [167] [INFO] Handling signal: ttou
[2019-08-30 15:14:31 +0000] [11446] [INFO] Worker exiting (pid: 11446)

2
Were you ever able to figure this out? I'm running into the same issue. I can run python and import everything i need with no errors, but when Airflow tries to load the dag, it gives me a bunch of Module not found errors. It's been driving me crazy!wymangr
We are getting the same issue with latest airflow version . Can anybody suggest here.Aviral Kumar

2 Answers

0
votes

For each dags, you need test it before run it.

you can do it use following cli commands to check environments and code logic:

airflow list dags
airflow test [dag_name] [task_name] [date]

Per you question, you should facing issue in environment dependency. you can check it by airflow list dags in the docker containers. To solve you quests, we have two way: 1. set the dags folder in the airflow.cfg file, put you module file insides the dags folder.

  1. check the python path insides the airflow environment, make sure you module can be access.

========================== update 1: in order to check if you module properly installed, you may use follow command:

  1. docker images | grep [ you airflow image name]
  2. find the container iD
  3. docker run [container id ] python
  4. in the docker python environment to check if your model properly installed. like : import os if you has get any error message, you need doubleck you moduel installation via pip.

Update 2: in order to check you dependecy, you can: write a simple dags, and use airflow test [dag_name] [dag_task_name] [ date] to see if work work.

In my understanding, you may try to build you airflow image from zero it may work well. The docker container may assumpt some environment. and user instance.

If you prefer to keep using network posed container, you can try switch to same use id when login and install you python lib. like following mode

docker exec -u [user_name you can find you the dockerfile] [container_id] you command.

Don't forget to commit every change to new image ID and load the container from new image ID, otherwise you may lost change every time run the

0
votes

Build the puckel dockerfile with the following:

docker build --rm --build-arg PYTHON_DEPS="flask_oauthlib>=0.9" -t puckel/docker-airflow .

Add the list of Python Dependencies you want to install via pip in PYTHON_DEPS variable as a comma-separated list.

This will build the image with your dependencies installed and you can then use it in your dags using just import yourpackage.