Slow performance of Power BI report connected to Azure Dedicated SQL pool via Direct Query

Question

Summary

I have a simple power BI report with 3 KPIs that take 0.3 seconds to refresh. Once I move the data to Azure Dedicated SQL Pool, same report takes 6 seconds to refresh. The goal is to have a report that would load in less than 5 seconds with 10-15 KPIs and up to 10 filters applied and connect to much larger (several million lines) datasets.

How to improve performance of Power BI connected via Direct Query to Azure Dedicated SQL pool?

More details

Setup: Dedicated SQL pool > Power BI (Direct Query)

Data: 3 tables (rows x columns: 4x136 user table, 4x3000 date table, 8x270.000 main table)

Data model: Main table to Date table connection - many-to-one based on “date” type field. Main to User connection - many-to-one based on user_id (varchar(10)). Other data types used: date, integers, varchars (50 and less).

Dedicated SQL pool performance level is Gen2: DW200c. However, the number of DWUs used never exceeds 50, most often it stays under 15.

Report: The report includes 3 KPIs that are calculated based on the same formula (just with “5 symbols text” being different):

KPI 1:=
CALCULATE(
    COUNTA('Table Main'[id]),
    CONTAINSSTRING('Table Main'[varchar 50 field 0], "5 symbols text"),
    ISBLANK('Table Main' [varchar 50 field 1]),
    ISBLANK('Table Main' [varchar 50 field 2])
    )

The report includes 6 filters: 5 “basic filtering” with 1 or several values chosen, 1 “string contains” filter.

Problem: The visual displaying 3 KPIs (just 3 numbers) takes 6…8 seconds to load, which is a lot, especially given the need to add more KPIs and filters. (compared to 0,3 seconds if all data is loaded in .pbix file). It is a problem because adding more KPIs on the page (10-15) increases the time to load proportionally or even more for KPI that are more complicated to calculate and reports become unusable.

Question: Is it possible to significantly improve the performance of the report/AAS/SQL pool (2…10 times faster)? How? If not, is it possible to somehow cache the calculated KPIs / visual contents in the report or AAS without querying the data every time and without keeping the data in pbix or AAS model?

Solutions tried and not working: Sole use and different combinations of clustered columnstore, clustered rowstore, non-clustered rowstore indexes. Automated statistics on/off. Automated indexes and stats do give an improvement of 10…20%, but it is definitely not enough.

Simple list of values (1 column from any table) takes 1.5 to 4 seconds to load)

What I have tried

Moving SQL pool from West Europe to France and back. No improvements
Applying indexes: row- and columnstore, clustered and non-clustered, manual and automatically defined (inl. statistics) - gives 10...20% improvements in performance, which does not resolve the issue.
Changing resource classes: smallrc, largerc, xlargerc. % of DWUs used still never exceeds 50 (out of 200). No improvements
Shrinking data formats and removing excessive data: minimal nvarchar(n) possible, the biggest is nvarchar (50), all excessive columns have been removed. No improvements
Isolating the model: I have a larger data model, for testing puproses I have isolated 3 tables into a separate model to make sure other parts do not affect the performance. No improvements.
Reducing number of KPI and filters With only 2 report filters left (main table fields only) the visual takes 2 seconds to load. With +2 filters on the connected date table 2,5 sec, with +2 filters on user table 6 sec. Which means that I would only be able to use 1-2 filter reports, which is not acceptable.

Kashyap Kashyap · Accepted Answer · 2021-02-05T23:45:29

It's a bit of trial and error process unfortunately. Here are a few things that are not in your list already:

many-to-one based on user_id (varchar(10)) -> Add a numeric column which is a hash of user_id column and use that to join instead of varchar column.
Ensure your statistics are up to date.
Try Dual mode. Load smaller dimension tables in-memory and keep fact tables in DB.
Use aggregates such that unless a user is trying to drill down report is actually populated without querying DB.
Partition your fact table by appropriate column.
Make sure you're using the right distribution for your fact table and have chosen the right column.

Be careful with Partitioning and Distribution. Synapse is designed a little differently than traditional RDBs (like MySQL), it's closer to NoSQL DBs to some extent, but not completely. So understand how these concepts work in Syanpse before using them (or your might get worse performance)!

Slow performance of Power BI report connected to Azure Dedicated SQL pool via Direct Query

1 Answers