3
votes

I am exploring Azure Data Lake and I am new to this field. I explored many things and read many articles. Basically I have to develop Power BI dashboard from data of different sources.

In classic SQL Server stack I can write an ETL (Extract, Transform, Load) process to bring the data from my system databases into the Data Warehouse database. Then use that Data Warehouse with Power BI by using SSAS etc.

But I want to use Azure Data Lake and I explored Azure Data Lake Store and Azure Data Lake Analytic(U-SQL). I draw following architecture diagram.

enter image description here

  1. Is there any thing which I am missing in current flow of the application?
  2. I can get data directly from Azure Data Lake using Power BI so there is no need of Data Warehouse. Am I right?
  3. I can create a database in Azure Data Lake is that will be my Data Warehouse?
  4. What will be the best format for the Output data from Original file in Azure Data Lake e.g .csv?
2

2 Answers

9
votes

1 & 2) Currently ADLS only has limited support for allowing PowerBI to query directly over it. If your data is too large (greater than about 10GB I believe), then PowerBI cannot work directly over data in your ADLS account. In this case, I would recommend either moving your processed data in ADLS to a SQL Database or SQL Data Warehouse, as this allows for PowerBI to operate over larger amounts of data. You can use Azure Data Factory to move your data, or Polybase if moving data into SQL DW.

3) A data lake is still distinct from a data warehouse, and they have separate strengths and weaknesses. The data lake is best for storing your raw or slightly processed data, which may have a variety of formats and schemas. After you process and filter this data using Azure Data Lake Analytics, you can move that data into SQL DW for interactive analytics and data management (but at the cost of inflexibility of schema).

4) Depends on your use case. If you plan on continuing to process the data in ADLS, I recommend you output into an ADLS table for greater performance. However, if you need to pass this data into another service, then CSV is a good choice. You can find more outputters on our GitHub such as JSON and XML.

3
votes

This answer may not be timely, but what I've tried that is more similar to your prior experience is spin up an instance of Azure Analysis Service. You can create a tabular model or mdx model, shove a ton of data into memory and connect to it from power bi. The "only" catch is that it can get pricey quick. One great thing about AAS is that the interface to build a tabular model nearly follows power query and uses dax.

Also I believe these days adla store is basically gone in favor of using blob storage directly, so basically you'd go data --> blob --> dla --> aas --> pbi.