We have an Airflow DAG running on an hourly schedule, with tasks updating and overwriting date-partitioned tables in BigQuery.
After making adjustments to the queries and/or schemas of these tables, we want to backfill several days' worth of existing partitions, but to backfill all runs is a huge waste of effort, since each hourly run will just overwrite the same partition 24 times before moving on to the next day.
We can use airflow list_dag_runs to list all runs and filter out the last one for each day, but is there a way to backfill/clear only these last runs per day without rerunning 24 instances every day?
The airflow clear and airflow backfill commands have options to specify start and end dates, but not specific instance times, so they will cause 24 reruns per date which will all perform the exact same work.
We could use airflow trigger_dag to trigger the DAG manually once per day, but then we will rerun the whole DAG even when there's only one task out of many that we need to backfill for.