Join Considerations in Azure SQL Data Warehouse

votes

How should you design your Fact and Dimension tables too speed up joins on the new Azure SQL Data Warehouse?

Would distributing the large fact tables by hash and replicating the smaller dimension tables help speed up the join or should indexing be the main consideration?

sqlazuredata-warehouseazure-sqldw

This is way too broad – Quassnoi

3 Answers

votes

Azure SQL Data Warehouse initially offers two table types - Round Robin and Hash Distributed (see the SQL DW Table docs at https://azure.microsoft.com/documentation/articles/sql-data-warehouse-develop-table-design/).

Generally for dimension tables, you'll choose round robin distribution. For fact tables you'll want to choose HASH based distributed table designs.

**Edit: Replicated is now supported too, which could be a useful option for some dimension tables.

votes

Your basic premise of distributing large fact tables by hash and replicating the smaller dimension tables works great in MPP environments like PDW, but as SQL DW doesn't suppose replicated data (yet - hopefully one day), you'll need to use the Round Robin distribution for that.

If you can minimise data movement, then you make some good steps towards improvement the performance of joins. However, having the data on the right server is only half the battle, and you should consider the indexing strategies as well, just as you would in a regular (SMP) SQL Server environment.

votes

Please note that ADW REPLICATE is in public preview but I think it is still buggy. I have several small tables that I have changed to REPLICATE but when I Join to these replicated tables and look at the explain xml plan, I still see data movement steps which should not be in the data is REPLICATED on all nodes. So to investigate why I did a DBCC PDW_SHOWSPACEUSED on several of the replicated tables and instead of seeing the row count being identical across all nodes they differ with some node having a zero row count. I am no expert by any means but I believe their is work to be done, but I cannot find any forums, discussions or feedback pages to report these issues to.