I've started learning AWS Redshift & I am coming across many things which I believe is not in favor of data warehouse star/snowflake schema.
Based on use responses, all recommended to have Redshift insert only method for the best performance because it is designed for read. But doesn't it increase the cost of storage? I am currently working on MSBI and my fact & dimension have complex structure. Ex: One fact table is shared across various business (data mart), few dimensions are type 2 (where I have to track the history) & few of them are not, few complex scenario need snowflake design.
Considering the cost of storage & computation on cloud, I would like to keep minimilastic data on cloud (the same I do it in my on premise system, which contribute to 4TB storage).
Now, if I do the same approach which I've been doing it on premise then I would have to run my ETL, compare key columns with staging & then perform CRUD, which makes it pointless to move my existing system to cloud. If I do go with flat table structure then I will end up having 4-6 times more data in my table which will increase the cost of storage on cloud and computation on top of it may cost extra.
How to handle Slowly Changing Dimension Type 2 in Redshift? Redshift Performance of Flat Tables Vs Dimension and Facts
Answers to above question talks about how flat tables can be more relatable to Redshift
But above Redshift blog talk about how star schema can be optimized.
Star and snowflake schemas run well on Amazon Redshift, and the addition of interleaved sort keys further enhances performance by reducing I/O for a wider range of filter predicates on a table when needed.
Now if I choose the approach for insert only (which compliments Redshift architecture) then I would be ending with paying more for storage. & if I choose to go for traditional data warehouse design then I would be ending up paying extra for computation cost.
Is there any real world example which you can state that can help me understand the approach you have followed in Redshift?