I'm trying to set up a simple DBT pipeline that uses a parquet tables stored on Azure Data Lake Storage and creates another tables that is also going to be stored in the same location.
Under my models/ (which is defined as my sources path) I have 2 files datalake.yml and orders.sql. datalake.yml looks like this:
version:2
sources:
- name: datalake
tables:
- name: customers
external:
location: path/to/storage1 # I got this by from file properties in Azure
file_format: parquet
columns:
- name: id
data_type: int
description: "ID"
- name: ...
My orders.sql table looks like this:
{{config(materialized='table', file_format='parquet', location_root='path/to/storage2')}}
select name, age from {{ source('datalake', 'customers') }}
I'm also using the dbt-external-tables package. Also note that when I run dbt debug everything is fine and I can connect to my database (which happens to be Databricks).
I tried running dbt run-operation stage_external_sources which returns Error: staging external sources is not implemented for the default adapter. When I run dbt run, I get Error: UnresolvedRelation datalake.customers.
Or perhaps I could make use of the hive metastore instead somehow? Any tips on how I could fix this would be highly appreciated!