4
votes

Can I externally(use a http request ?) to mark a specific task_id associated with dag_id and run_id as success/failure.

My task is a long running task on external system and I don't want my task to poll the system to find the status.. since we can probably have several 1000 task running at same time ..

Ideally want my task to

  • make a http request to start my external job
  • go to sleep
  • once the job is finished, it(External system or the post build action of my job) informs airflow that the task is done (identified by task_id, dag_id and run_id)

Thanks

2

2 Answers

2
votes

You can solve this by sending SQL queries directly into Airflow's metadata DB:

UPDATE task_instance
SET state = 'success',
    try_number = 0
WHERE
    task_id = 'YOUR-TASK-ID'
AND
    dag_id = 'YOUR-DAG-ID'
AND
    execution_date = '2019-06-27T16:56:17.789842+00:00';

Notes:

  • The execution_date filter is crucial, Airflow identifies DagRuns by execution_date, not really by their run_id. This means you really need to get your DagRun's execution/run date to make it work.
  • The try_number = 0 part is added because sometimes Airflow will reset the task back to failed if it notices that try_number is already at its limit (max_tries)

You can see it in Airflow's source code here: https://github.com/apache/airflow/blob/750cb7a1a08a71b63af4ea787ae29a99cfe0a8d9/airflow/models/dagrun.py#L203

1
votes

Airflow doesnt yet have a Rest endpoint. However you have a couple of options - Using the airflow command line utilities to mark the jobs to success. E.g. In python using Popen. - Directly update the Airflow DB table task_instance