0
votes

I have a requirement to classify PII and Non PII data from Azure Data lake Gen 1 file. As i am looking not to duplicate the Physical file creation in Data lake, can we create logical views in Data lake preferably compare to materialized views.

Preferable Technology - Azure Synapse analytics/ Databricks / VM with spark process,...

Also technology which is easy to Integrate with Active Directory for RBAC.

Let me know the possibilities.

1

1 Answers

0
votes

You can achieve this today with Azure Synapse SQL on-demand. Here is how:

  1. Create a database
  2. Create database scoped credentials
  3. Create external data source using that credential
  4. Create both VIEWs over files

Here are some docs that can help you with these steps: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/create-use-external-tables https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/create-use-views

Once that is done, you should create logins and inside of the database create users tied to those logins. This doc can help: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/access-control#database-level-permission

Now you should grant appropriate permissions, based on your description I think you would need these permissions:

  1. GRANT SELECT on a VIEW – doc: https://docs.microsoft.com/en-us/sql/t-sql/statements/grant-object-permissions-transact-sql?view=sql-server-ver15
  2. GRANT REFERENCES on a CREDENTIAL – doc: https://docs.microsoft.com/en-us/sql/t-sql/statements/grant-database-scoped-credential-transact-sql?view=sql-server-ver15
  3. DENY ADMINISTER DATABASE BULK OPERATIONS – doc: https://docs.microsoft.com/en-us/sql/t-sql/statements/deny-database-permissions-transact-sql?view=sql-server-ver15

This will give you a database with views and users that can access VIEW they were allowed to access and they won’t be able to create new objects in the database nor use the credential to do random queries over the lake.