1
votes

I'm setting up airflow such that webserver runs on one machine and scheduler runs on another. Both share the same MySQL metastore database. Both instances come up without any errors in the logs but the scheduler is not picking up any DAG Runs that are created by manually triggering the DAGs via the Web UI.

The dag_run table in MysQL shows few entries, all in running state:

mysql> select * from dag_run;
+----+--------------------------------+----------------------------+---------+------------------------------------+------------------+----------------+----------+----------------------------+
| id | dag_id                         | execution_date             | state   | run_id                             | external_trigger | conf   | end_date | start_date                 |
+----+--------------------------------+----------------------------+---------+------------------------------------+------------------+----------------+----------+----------------------------+
|  1 | example_bash_operator          | 2017-12-14 11:33:08.479040 | running | manual__2017-12-14T11:33:08.479040 |                1 | ��       }�.    | NULL     | 2017-12-14 11:33:09.000000 |
|  2 | example_bash_operator          | 2017-12-14 11:38:27.888317 | running | manual__2017-12-14T11:38:27.888317 |                1 | ��       }�.    | NULL     | 2017-12-14 11:38:27.000000 |
|  3 | example_branch_dop_operator_v3 | 2017-12-14 13:47:05.170752 | running | manual__2017-12-14T13:47:05.170752 |                1 | ��       }�.    | NULL     | 2017-12-14 13:47:05.000000 |
|  4 | example_branch_dop_operator_v3 | 2017-12-15 04:26:07.208501 | running | manual__2017-12-15T04:26:07.208501 |                1 | ��       }�.    | NULL     | 2017-12-15 04:26:07.000000 |
|  5 | example_branch_dop_operator_v3 | 2017-12-15 06:12:10.965543 | running | manual__2017-12-15T06:12:10.965543 |                1 | ��       }�.    | NULL     | 2017-12-15 06:12:11.000000 |
|  6 | example_branch_dop_operator_v3 | 2017-12-15 06:28:43.282447 | running | manual__2017-12-15T06:28:43.282447 |                1 | ��       }�.    | NULL     | 2017-12-15 06:28:43.000000 |
+----+--------------------------------+----------------------------+---------+------------------------------------+------------------+----------------+----------+----------------------------+
6 rows in set (0.21 sec)

But the Scheduler that's started up on another machine and connected to the same MySQL DB is just not interested in talking to this DB and actually running these DAG runs and converting them to Task Instances.

Not sure what I'm missing in the setup here. So few questions:

  1. When and how is the DAGS folder located at $AIRFLOW_HOME/dags populated? I think its when the webserver is started. But then if I just start the scheduler on another machine, how will the DAGS folder on that machine be filled up?
  2. Currently, I'm doing airflow initdb only on the machine hosting the webserver and not on scheduler. Hope that is correct.

Can I enable debug logs for Scheduler to get more logs that could indicate what's missing? From the current logs it looks like it just looks in the DAGS folder on local system and finds no DAGS there ( not even example ones ) inspite of the config to load examples set as True.

Don't think it matters but I'm currently using a LocalExecutor

Any help is appreciated.

Edit: I know that I need to sync up DAGS folder across machines as the airflow docs suggest but not sure if this is the reason why Scheduler is not picking up the tasks in the above case.

1
I think it's expected that you will deploy your dags (.py scripts) into the $AIRFLOW_HOME/dags directory, as specified in airflow.conf file. Scheduler is the service that looks in the dags directory, picks them up and adds them to dag bag. To turn on debug, you can edit the settings.py file set the LOGGING_LEVEL to 1logging.DEBUG1 and restart the services e.g. github.com/apache/incubator-airflow/blob/master/airflow/… I think it's best to ignore those example dags and just try creating one based on the tutorial - Davos
My point here is that if the airflow.cfg says to load examples, then the scheduler should also work if there are no self-created dags in the Dags folder. - Agraj

1 Answers

1
votes

Ok, I got the answer - It looks like the Scheduler does not query the DB until there are any DAGS in the local DAG Folder. The code in job.py looks like

ti_query = (
        session
        .query(TI)
        .filter(TI.dag_id.in_(simple_dag_bag.dag_ids))
        .outerjoin(DR,
            and_(DR.dag_id == TI.dag_id,
                 DR.execution_date == TI.execution_date))
        .filter(or_(DR.run_id == None,
                not_(DR.run_id.like(BackfillJob.ID_PREFIX + '%'))))
        .outerjoin(DM, DM.dag_id==TI.dag_id)
        .filter(or_(DM.dag_id == None,
                not_(DM.is_paused)))
    )

I added a simple DAG in my local DAG folder on the machine hosting Scheduler and it started picking up other DAG instances as well.

We raised an issue for this - https://issues.apache.org/jira/browse/AIRFLOW-1934