I was able to run more than 165,000 airflow tasks!
But there's a catch. Not all the tasks were scheduled and rendered in a single Airflow Dag.
The problems I faced when I tried to schedule more and more tasks are that of scheduler and webserver.
The memory and cpu consumption on scheduler and webserver dramatically increased as more and more tasks were being scheduled (it is obvious and makes sense). It went to a point where the node couldn't handle it anymore (scheduler was using over 80GB memory for 16,000+ tasks)
I split the single dag into 2 dags. One is a leader/master. The second one being the worker dag.
I have an airflow variable that says how many tasks to process at once (for example, num_tasks=10,000). Since I have over 165,000 tasks, the worker dag will process 10k tasks at a time in 17 batches.
The leader dag, all it does is trigger the same worker dag over and over with different sets of 10k tasks and monitor the worker dag run status. The first trigger operator triggers the worker dag for the first set of 10k tasks and keeps waiting until the worker dag completes. Once it's complete, it triggers the same worker dag with the next batch of 10k tasks and so on.
This way, the worker dag keeps being reused and never have to schedule more than X num_tasks
The bottom line is, figure out the max_number of tasks your Airflow setup can handle. And then launch the dags in leader/worker fashion for max_tasks over and over again until all the tasks are done.
Hope this was helpful.