S3 Integration with Snowflake: Best way to implement multi-tenancy?

Question

My team is planning on building a data processing pipeline that will involve S3 integration with Snowflake. This article from Snowflake shows that an AWS IAM role must be created in order for Snowflake to access S3's data.

However, in our pipeline, we need to ensure multi-tenancy and data isolation between users. For example, let's assume that Alice and Bob has files in S3 under "s3://bucket-alice/file_a.csv" and "s3://bucket-bob/file_b.csv" respectively. Then, we want to make sure that, when staging Alice's data onto Snowflake, Alice can only access "s3://bucket-alice" and nothing under "s3://bucket-bob". This means that individual AWS IAM roles must be created for each user.

I do realize that Snowflake has it's own access control system, but my team wants to make sure that data isolation is fully achieved from the S3-to-Snowflake stage of the pipeline, and not only relying on Snowflake's access control.

We are worried that this will not be scalable, as AWS sets a limit of 5000 IAM users, and that will not be enough as we scale our product. Is this the only way of ensuring data multi-tenancy, and does anyone have a real-world application example of something like this?

Hi - what is your definition of multi-tenancy in this scenario? Do you plan for users within each tenancy to potentially have access to all Snowflake functionality or do you plan to limit what they can do e.g. they can only query data and you, as the owner of the multi-tenancy, will be responsible for any data loading? — NickW

Mike Walton Mike Walton · Accepted Answer · 2021-02-20T01:39:13

Have you explored leveraging Snowflake's Internal Stage, instead? By default, every user gets their own internal stage that only they have permissions to from within Snowflake and NO access outside of Snowflake. Snowflake offers the ability to move data in and out of that Internal Stage using just about every driver/connector that Snowflake has available. This said, any pipeline/workflow that is being leveraged by 5000+ users would be able to use these connectors to load data to Snowflake Internal Stage (S3) without the need for any additional AWS IAM Users. Would that be a sufficient solution for your situation?

S3 Integration with Snowflake: Best way to implement multi-tenancy?

1 Answers