Most efficient Snowflake connection type from PowerBI?

Question

We're trialling PowerBI on a Snowflake dimensional model and performance seems very non-optimised. Can anyone point me to information on best practices for this connection? I've previously used Tableau and there's an excellent white paper describing the pros/cons of each connection type and how to set this up so that as much heavy lifting as possible is done in Snowflake, with minimal load on the viz tool.

e.g. when you summarise 1 million invoices to get a chart of sales volume by year that distils this to 10 data points, Tableu would send 'SELECT year, sum(volume) FROM t GROUP BY year' (~10 rows), but in PowerBI we see SF receiving a query like 'SELECT invoice_id, sum(volume) FROM t GROUP BY invoice_id' (~1M rows) - leaving the viz tool to do a lot more work.

So far, we've tried mapping the individual facts and dimensions within PowerBI, and also using a mix of direct query and import, but without significant improvement. Is there any guidance on best practice?

Thanks in advance!

I'm not experienced with PowerBI guy, but some of my customers are using it with Snowflake. If it imports large rowsets, the connection is probably using import. PowerBI supports live mode with Snowflake, so you'll want to switch it to using that. community.powerbi.com/t5/Community-Blog/… — Greg Pavlik
The poor performance persists when using direct query. I do wonder whether MSFT are dragging their feet on making this interface more efficient because it would undercut their Azure data warehouse offerings. — Tango delta

Giovanni Luisotto Giovanni Luisotto · Accepted Answer · 2020-02-24T13:46:23

I've never used Snowflake, and I have no clue about how PowerBi interfaces to it. That said on the PowerBI side you may be interested in the composite model and aggregations.

MS Docs:

Radacad's blog about aggregations:

In practice, when you are using a composite model the aggregation functionality allows you to create a hidden table (in import mode) in your model with aggregated data (by year, month, customer, etc). Now when you query your data, PowerBI will check if this table can answer the query, if yes then it will just pick the data from this table, otherwise, it will run a query against the source (direct query)

The example you shared about PowerBI querying the source without asking for aggregation (but instead asking for every single InvoiceId) might be caused by not setting up the composite model correctly.

A table in "direct query" cannot reference other tables in its query (in this case the calendar) unless that table is also in "Direct query" or "dual" mode. How does the model look like in the case you shared? and which is the storage mode of each table?

Most efficient Snowflake connection type from PowerBI?

1 Answers