17
votes

In Airflow, how should I handle the error "This DAG isn't available in the webserver DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database"?

I've copied a new DAG to an Airflow server, and have tried:

The scheduler log shows it being processed and no errors occurring, I can interact with it and view it's state through the CLI, but it still does not appear in the web UI.

Edit: the webserver and scheduler are running on the same machine with the same airflow.cfg. They're not running in Docker.

They're run by Supervisor, which runs them both as the same user (airflow). The airflow user has read, write and execute permission on all of the dag files.

5
Adding more info about your setup could be helpful debugging this issue. Are you running webserver and scheduler on the same machine with the same airflow.cfg? Are they running in docker with the volume being mounted? Some information like that might make it easier to debug how the disconnect is happening.jhnclvr
@jhnclvr sure, I've added some details. Not sure what else to say about the serverOllie Glass
This solution might be helpful stackoverflow.com/questions/52934625/…knutole

5 Answers

6
votes

This helped me...

pkill -9 -f "airflow scheduler"

pkill -9 -f "airflow webserver"

pkill -9 -f "gunicorn"

then restart the airflow scheduler and webserver.

5
votes

Just had this issue myself. After changing permissions, resetting the meta database, restarting the webserver & even making some potential code changes to rectify the situation, it didn't happen.

However, I noticed that even though we were stopping the webserver, our gunicorn process was still running. Killing these processes & then starting everything back up resulted in success

2
votes

I had the same problem on an airflow installed from a Docker image

What I did was:

1- delete all files .pyc

2- delete Metadata databse using :

for t in ["xcom", "task_instance", "sla_miss", "log", "job", "dag_run", "dag" ]:
sql="delete from {} where dag_id='{}'".format(t, dag_input)
hook.run(sql, True)

3- restart webserver & scheduler

4- Execute airflow updatedb

It resolved the problem for me.

0
votes

if the airflow_home - dags_folder config parameter is same for scheduler, webUI and the command line interface the only cause for the error:

This DAG isn't available in the webserver DagBag object

can be file permissions or error in python script.

Please check

  • Run the dag as normal python script and check for errors
  • User in airflow.cfg and the one creating the dag should be same or the dag should have execute permission for the airflow user
-1
votes

With Airflow 1.9 I don't experience the problem with zombie gunicorn processes.

I do a simple restart: systemctl restart airflow-webserver and it forces webserver to refresh DAG status.