10
votes

I'm currently developing DAGs for Airflow. I like to use PyCharm and tend to spin up a virtual environment for each of my projects.

Airflow depends on an AIRFLOW_HOME folder that gets set during the installation. Subdirectories are then created within this folder by Airflow.

I'm interested in how others structure their projects to allow for virtual environments that contain packages (such as facebookads) that are needed for acquiring data - while also easily dropping the DAGs into Airflow's DAGS folder for testing.

2
funny coincidence, i am also writing a DAG to access facebook ads :) i'll try to post here my project structure when i solve import issues similar to what Adam is talking in his comment to one of the answersDenisFLASH

2 Answers

3
votes

In our current case, we follow a simple structure:

 - dags
  - dag001.py
  - dag001.py
  - helpers
     - dag_001_helpers
         - file01.py
         - file02.py
     - dag_002_helpers
         - file01.py
         - file02.py
   - configs 
     - dag_001_configs
         - file11.json
         - file12.sql
     - dag_002_configs
         - file21.json
         - file22.py
1
votes

In my projects I use:

- config
  - config_1.yaml
  - config_1.env
- DAGs
  - dag_1.py
     -dag_1_etl_1.sql
     -dag_1_etl_2.sql
     -dag_1_etl_3.sql
     -dag_1_bash_1.sh
  - dag_2.py
  - dag_3.py
- operators
  - operator_1.py
  - operator_2.py
  - operator_3.py
- hooks
  - hooks_1.py

For our use case: 1) Every object that can be reused we store in a separate folder with the same kind of object;

2) Every DAG in terms of SQL must be self-contained to avoid non-mapped dependencies