Production Cluster details:
- Node Type dc1.8xlarge
- Nodes 25
- 2.56TB SSD storage per node
Test Cluster details:
- Node Type ds2.xlarge
- Nodes 6
- 2TB HDD storage per node
When same table with exactly same DDL & encoding is unloaded and copied from production cluster to test cluster, its disk footprint reduces exponentially. This has been tested with multiple tables with different distribution styles and sort key patterns.
Example:
Table A (No sort key, DISTSYLE EVEN) - Size in production: 60GB; Size in test: 0.6 GB
Table B (Sort key, DISTSTYLE KEY) - Size in production: 96GB 100% sorted; Size in test: 1.4 GB 100% sorted
Any ideas what can result in this discrepancy? I have read most of redshift forums but not able to find a reason for this issue. I am using the admin view v_space_used_per_tbl (provided by AWS) for calculating size of the table.