I have data in S3 and RDS, and use AWS Glue + Athena as a DW currently.
My customers (500+) are:
- Internal customers: for querying (SQL) data and reporting (hence Athena)
- External customers: for querying (SQL) their data, hooking up the the data warehouse to a BI product (e.g. Tableau) for reporting
Questions:
- Is Redshift a better DW (vs Athena/Glue) or RDS?
- How do I setup the DB so that it is appropriate for external customers to access a limited subset of data of the DW?
What I think:
- Redshift is better for concurrent SQL queries, scalability, performance etc and is able to integrate with the most common BI resources.
- Redshift is not good for limiting access / I have not seen a smart way of restricting access at a table level. One option is to 'copy' the data into a new schema but this means duplication of data. In RDS (PostgreSQL) you can create a view of the data and expose that to specific clients only. If there is a smart way to tackle this I am curious?