3
votes

So here's a stupid idea...

I created (many) DAG(s) in airflow... and it works... however, i would like to package it up somehow so that i could run a single DAG Run without having airflow installed; ie have it self contained so i don't need all the web servers, databases etc.

I mostly instantiate new DAG Run's with trigger dag anyway, and i noticed that the overhead of running airflow appears quite high (workers have high loads doing essentially nothing, it can sometimes take 10's of seconds before dependent tasks are queued etc).

i'm not too bothered about all the logging etc.

3

3 Answers

1
votes

It sounds like your main concern is the waste of resources by the idling workers more so than the waste of Airflow itself.

I would suggest running Airflow with the LocalExecutor on a single box. This will give you the benefits of concurrent execution without the hassle of managing workers.

As for the database - there is no way to remove the database component without modifying airflow source itself. One alternative would be to leverage the SequentialExecutor with SQLite, but this removes the ability to run concurrent tasks and is not recommended for production.

0
votes

First I'd say you need to tweak you airflow setup.

But if that's not an option, then another way is to write your main logic in code outside the DAG. (This is also best practice). For me this makes the code easier to test locally as well.

Writing a shell script is pretty easy to tie a few processes together.

You won't get the benefit of operators or dependencies, but you probably can script your way around it. And if you can't, just use Airflow.

0
votes

You can create a script which executes airflow operators, although this loses all the meta data that Airflow provides. You still need to have Airflow installed as a Python package, but you don't need to run any webservers, etc. A simple example could look like this:

from dags.my_dag import operator1, operator2, operator3

def main():
    # execute pipeline
    # operator1 -> operator2 -> operator3

    operator1.execute(context={})
    operator2.execute(context={})
    operator3.execute(context={})

if __name__ == "__main__":
    main()