I am trying to build an Azure Data Factory pipeline that grabs data from an Azure Data Lake and transforms it.
I want to store the transformed data in an Azure SQL Data Warehouse (as facts and dimensions).
I ended up with two solutions to make the data available in the warehouse:
- Use CREATE EXTERNAL TABLE and reference the transformed data that sit in the Lake in
.csv
files; - Use CTAS to copy the
.csv
files from the Lake into the Warehouse;
Question
It is not obvious to me what the tradeoffs in play. On one hand, it seems to me that the 1st option avoids copies to the warehouse (is it though?). But the 2nd option would offer more options such as indexing.
What are the pros & cons of CREATE EXTERNAL TABLE compared to CTAS?