I am working with Airflow (on Cloud Composer) since one year now and I have difficulties to find out how (Celery) workers do know what actions to perform when they receive a task to execute.
From what I understand :
- We put some DAGs in the /dags folder.
- The scheduler, through a loop process, parses the DAGs and save the results in the metadata DB, it also determines if a task from a DAG has to run based on its dependencies.
- If some tasks have to run, the Executor will sends the task to a queue which is listened by the Celery workers.
- One of the Celery workers gets the task to execute and do the job.
But how the Celery worker knows what to execute? I can see that there is a log saying :
[2021-06-30 12:58:59,814] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'dag_to_exec', 'task_to_exec', '2021-06-30T12:57:09+00:00', '--job_id', '2822201', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/dag_to_exec.py', '--cfg_path', '/tmp/tmpank91zop']
Please correct me if I'm wrong but does the part '-sd', 'DAGS_FOLDER/dag_to_exec.py'
is here to say to this Airflow worker "Execute this task from this dag which is saved there"? So the Airflow worker also needs to parse the DAG to understand it as well right? I'm saying "also" because the scheduler did parse it too earlier.
If you have links to share or part of the source code to look at to understand this, thanks in advance !