I am using the dbt CLI to regularly update data via dbt run
. However, this materializes several tables, and can take 20+ hours to do a full refresh.
I am currently running this from my PC/cloud VM, but I don't want to keep my PC on / VM running just to run the dbt CLI. Moreover, I've been burned before by trying to do this (brief Wi-Fi issue interrupting a dbt operation 10h into a 12h table materialization).
Are there any good patterns for this? Note that I'm using SQL Server which is not supported by DBT cloud.
I've considered:
- Setting up airflow / prefect
- Having a small vm just for DBT to run
- Moving to a faster database (eg. from Azure SQL to Azure Synapse)
Any ideas?
dbt run
commands so that you're only running perhaps a schema at a time. That way you won't lose everything in the case of a wifi interruption or similar. – Branden Ciranni