My team is planning on building a data processing pipeline that will involve S3 integration with Snowflake. This article from Snowflake shows that an AWS IAM role must be created in order for Snowflake to access S3's data.
However, in our pipeline, we need to ensure multi-tenancy and data isolation between users. For example, let's assume that Alice and Bob has files in S3 under "s3://bucket-alice/file_a.csv"
and "s3://bucket-bob/file_b.csv"
respectively. Then, we want to make sure that, when staging Alice's data onto Snowflake, Alice can only access "s3://bucket-alice"
and nothing under "s3://bucket-bob"
. This means that individual AWS IAM roles must be created for each user.
I do realize that Snowflake has it's own access control system, but my team wants to make sure that data isolation is fully achieved from the S3-to-Snowflake stage of the pipeline, and not only relying on Snowflake's access control.
We are worried that this will not be scalable, as AWS sets a limit of 5000 IAM users, and that will not be enough as we scale our product. Is this the only way of ensuring data multi-tenancy, and does anyone have a real-world application example of something like this?