This is more like a puzzling question for me and would like to understand why.
I have two tables, almost identical the only differences are one column's data type and sortkey.
table mbytes rows
stg_user_event_properties_hist 460948 2378751028
stg_user_event_properties_hist_1 246442 2513860837
Even though they have almost same number of rows, size is close to double.
Here are the table structures
stg.stg_user_event_properties_hist
(
id bigint,
source varchar(20),
time_of_txn timestamp,
product varchar(50),
region varchar(50),
city varchar(100),
state varchar(100),
zip varchar(10),
price integer,
category varchar(50),
model varchar(50),
origin varchar(50),
l_code varchar(10),
d_name varchar(100),
d_id varchar(10),
medium varchar(255),
network varchar(255),
campaign varchar(255),
creative varchar(255),
event varchar(255),
property_name varchar(100),
property_value varchar(4000),
source_file_name varchar(255),
etl_batch_id integer,
etl_row_id integer,
load_date timestamp
);
stg.stg_user_event_properties_hist_1
(
id bigint,
source varchar(20),
time_of_txn timestamp,
product varchar(50),
region varchar(50),
city varchar(100),
state varchar(100),
zip varchar(10),
price integer,
category varchar(50),
model varchar(50),
origin varchar(50),
l_code varchar(10),
d_name varchar(100),
d_id varchar(10),
medium varchar(255),
network varchar(255),
campaign varchar(255),
creative varchar(255),
event varchar(255),
property_name varchar(100),
property_value varchar(4000),
source_file_name varchar(255),
etl_batch_id integer,
etl_row_id varchar(20),
load_date timestamp
);
The differences again etl_row_id
has data type varchar(20) in _1, integer in the other table, and the first table has a sortkey on source column.
What would be the explanation for the size difference?
UPDATE: The problem was both compression and sort keys, even though _1 table created with CTAS 11 of 26 had different compression settings, also the first table was created with Compound SortKey of 14 columns, recreated the table with no sort keys (it's a history table after all) size went down to 231GB.
VACUUM
on both tables to remove deleted/updated rows? – John Rotenstein