3
votes

I'm tring to assess the best approach, in terms of cost and performance, in accessing data and running queries on a dataset using Power BI with Azure Analysis Services or Power BI with Direct Query.

I have trie to illustrate both approaches with the following diagram. enter image description here

From the diagram steps 4 and 5 describe accessing data in Azure Data Lake using Power BI with Direct Query. Wherease, steps 4 and 6 describes accessing data using Power BI with Azure Analysis Services.

From my own research, Direct Query is notorious for having performance issues e.g

All DirectQuery requests are sent to the source database, so the time required to refresh a visual is dependent on how long that back-end source takes to respond with the results from the query (or queries).

The above statement is well documented, however, in my design DirectQuery request shouldn't be an issue because most of the logic and transformation will take place in Databricks (although, I don't want this question to focus on Databricks).

On the other hand, with Azure Analysis Services(AAS) all requests occur in memory as opposed to DirectQuery and therefore much faster.

So, I would like it if you could share you experience using DirectQuery, and AAS. If you could let me know if I have missed out on any advantages/disadvantages of using on technology over the other/

1
What size is the data?GregGalloway
Generally speaking the data will range from 1Gb to 20GbCarltonp

1 Answers

5
votes

The Power BI (PBI) data model, is a lighter weight version of Analysis Service. If you have PBI Desktop open, you can open task manager and see that there is a Analysis Services instance in the background. In Power BI the dataset size is limited to 1GB, in Premium it is 10GB, with the ability to refresh to 12GB.

Power BI Desktop Analysis Services Instance

Analysis Services will be able to hold more data, and is not limited to the limited data set sizes, and you also have other features based for an Enterprise Organisation. Analysis Services can also sit over a data source in direct query mode or import the data, like Power BI.

In your question you mentioned that the Direct Query Mode 'is notorious for having performance issues', however that will be dependent on the structure and size of the data source. For a number of projects that I have deployed, I have used Direct Query to sit over data sources that have been at least 50-100GB, however these have been mostly standard Star Schema data warehouses, or a defined reporting table, both will have the relevant indexes, covering indexes, or Column Store Indexes to allow more efficient retrieval of data. Direct Query Mode will slow down due to the number of query's that it has the do on the data source based on the measure, relationships and the connection overhead. Another can be the number of visuals on page, as each visual is a query and each one has to run on the data source.

One other method to increase the speed of Direct Query would be to use Aggregations in Power BI, to store an Imported subset of data in Power BI. If the query can be answered by the aggregation layer then it will be answered quicker. Microsoft demonstrated this with the 'Trillion Row Demo'

In terms of the Power BI Direct Query Issues, from the range of clients that I interact with, those that do have issues with Direct Query, have a mash up of tables in an inefficient schema, running sub optimal query's on the data source, with a number of data transformations in DAX, and DAX measures that have been badly written, for example lots of DISTINCT COUNTS & SWITCH.

So, if you wish to import the data, and it is over dataset size limits then Analysis Services is your best option. If you can set up the data structure in a good way, there should be no issues with Power BI and Direct Query.

Hope that helps