4
votes

Are there any bandwidth limits or throttling for Azure SQL Data Warehouse extracts? Are there any connection string settings that optimize how fast we can extract data via a SELECT query?

From SSIS on a VM in the same Azure region as SQL DW, if I run a SELECT * query to extract via OLEDB millions of rows with the default connection string (default packet size) I see it use about 55Mbps bandwidth. If I add Packet Size=32767 I see it use about 125Mbps bandwidth. Is there any way to make it go faster? Are there other connection string settings to be aware of?

Incidentally, I was able to get up to about 500Mbps bandwidth coming from SQL DW if I run multiple extracts in parallel. But I can't always break one query into several parallel queries. Sometimes I just need one query to extract the data faster.

Of course Polybase CETAS (CREATE EXTERNAL TABLE AS SELECT) is much more efficient at extracting data. But that's not a good fit in all extract scenarios. For example, if I want to put Analysis Services on top of Azure SQL DW, I can't really involve a CETAS statement during cube processing so Polybase doesn't help me there.

1

1 Answers

2
votes

At the moment your best option is to run multiple extracts in parallel, optimizing the packet size as you have described. For SSAS on top of SQLDW your best option would be to use parallel partition processing.