3
votes

Question

I think it's pretty common issue, hope there are solutions/approaches we can reuse.

We're building data lake in Azure ADLS gen2, having unidirectional data flow: Nifi/ADF -> ADLS -> ETL/Spark/Databricks -> Data Warehouse -> Power BI. Some ETL inputs should be loaded/updated by responsible business users on weekly/monthly basis.

Could you please suggest/improve solutions for business users to upload ETL inputs with meeting the requirements below?

Requirements

  1. User friendly interface.
  2. Parsing/validation. We must to make sure that data fit expected format during uploading.
  3. Audit logging. We must have ability to track who and when loaded what. In case of some delays, notifications should be sent.
  4. Simplicity of implementation and compliance with current architecture.

Requirements achievements are estimated from 1 (very bad practice approach) to 5 (100% clean, easy to implement solution).

Possible solutions

  1. Upload files from Power Apps to Data Warehouse.
    Flow: Business users -> Power Apps -> Data Warehouse & Stored Procedures -> ADLS -> Spark -> Data Warehouse -> Power BI.

    • 1 requirement = 5. Very user friendly interface built with Power Apps.
    • 2 requirement = 2. Poor implementation of validations/transformations in SQL stored procedures. All other application code is written in Spark.
    • 3 requirement = 3-5. Not sure how to implemenent this yet.
    • 4th requirement = 2. Data flow becomes bidirectional means DW -> ADLS -> DW. Harder to reason about and orchestrate.
  2. Use Spark/Databricks over ADLS events/triggers.
    Flow: Business users -> Microsoft Storage Explorer app -> ADLS gen2 -> Azure Blob Storage trigger -> Azure Function -> Spark parsing/validation job -> ADLS gen2

    • 1st requirement = 3-4. Uploading through Storage Explorer is very user friendly, the only issue is that the mechanics for notifying user about success/failure can be done through email and may not be very clear.
    • 2nd requirement = 5. I like parsing/validation happen on ETL side, not on Data Warehouse stored procedures.
    • 3rd requirement = 1-3. Isn't clear hot to achieve that currently. Expecting it'll be worse than with Power Apps.
    • 4th requirement = 4. Unidirectional process, data aren't moved from DW to Data Lake. 4 not 5, because it isn't very clear for business user that success/failure notification will come through email. Also little bit bigger complexity of implementation.
1
Have you looked at Azure Storage Explorer?wBob
@wBob yes, I'm using it currently for managing ADLS. Why do you asking?VB_
It meets a lot of your requirements but admittedly is not intended as a non-technical business user tool. The trade-off, ie developing your own tools is considerable, so I was thinking you could offset that with some education. For example, run 30 min sessions with your users on installing and usage of Azure Storage Explorer, as opposed to say several weeks of custom app development, testing, and still some education required.wBob
@wBob yeah, that's actually my option two. The only concern is - business users need to get feedback whether data are valid as soon as possible. Is ADLS trigger Spark job seems good for you? Not very good for me, but I can't see a better option at the momentVB_
How free are you in choosing your tools? Is it possible to use tools other than those mentioned (Power BI, Spark). I understand that it has to be fully on Azure, correct? Also: What would be your ideal weighting of your four requirements? I mean, if you order them, what do you want to have first?B--rian

1 Answers

1
votes

The scene

I understand that you are looking for an unidirectional ETL process/ architecture which has the following features (ordered by priority):

  • is user-friendly
  • validates input data
  • parses the input data using SQL stored procedures
  • is simple to implement making use of a Azure Data Lake Storage (ADLS)
  • provides at least basic audit features by logging who uploaded what and when to the data lake

The definition of user-friendly is also a bit vague since users will eventually get used to tools which are not intuitive to use - you can e.g. force them to attend courses. I know business users who are not at all intrigued by using e.g. Power BI, but they do not have any choice to not use it.

The suggestion

My general experience is that customized front-ends, tailored to the business needs lead to much happier users than if let them use a massive all-in-one Swiss army knife of which only a few functions are used in the user's daily business. I have never seen PowerApps being used, but what I get from their website and from https://alternativeto.net/software/microsoft-powerapps/ is that is some kind of low-code app/ UI building platform.

I would personally for low-code tools with a large user community such as e.g. Tableau, Qlik, or Appian. I am not associated with any of them, but I managed to connect all of them to a SQL databases rather quickly, that is why I mention those three.

You said

The Storage Explorer has enough user friendly interface, and has zero cost to implement. The only concern is about validation-feedback and audit. In order to mitigate validation - we may create some king of Excel templates for business users.

So I guess you probably will go for this solution, but I would never sacrifice audit. Similar to backup, one usually learns only in case of mishaps that what the real cost of not having a backup or audit is. In case of an cyber attack, or white collar crimes, a business is usually in a severe need of log files.