I created two tables with 43,547,563 rows each:
CREATE TABLE metrics_compressed (
some_id bigint ENCODE ZSTD,
some_value varchar(200) ENCODE ZSTD distkey,
...,
some_timestamp bigint ENCODE ZSTD,
...,
PRIMARY KEY (some_id, some_timestamp, some_value)
)
sortkey (some_id, some_timestamp);
The second one is exactly like the first one but without any column compressed.
Running this query (it just counts one row):
select count(*)
from metrics_compressed
where some_id = 20906
and some_timestamp = 1475679898584;
shows a table scan of 42,394,071 rows (from the rows_pre_filter column in svl_query_summary, column is_rrscan true) and while running it over the uncompressed table it scans 3,143,856. I guess the reason for this is that the compressed one uses less 1MB blocks, hence the scan shows the total number of rows from the retrieved blocks.
Are the scanned rows a sign of bad performance? Or does Redshift use some kind of binary search within a block for such simple queries as this one, and the scanned rows is just confusing info for optimizing queries?