A quick summary of the commands, from the VACUUM
documentation:
VACUUM
: Sorts the specified table (or all tables in the current database) and reclaims disk space occupied by rows that were marked for deletion by previous UPDATE and DELETE operations. A full vacuum doesn't perform a reindex for interleaved tables.
VACUUM REINDEX
: Analyzes the distribution of the values in interleaved sort key columns, then performs a full VACUUM operation.
ANALYZE
: Updates table statistics for use by the query planner.
It is good practice to perform an ANALYZE
when significant quantities of data have been loaded into a table. In fact, Amazon Redshift will automatically skip the analysis if less than 10% of data has changed, so there is little harm in running ANALYZE
.
You mention that some tables get fully replaced every day. This should be done either by dropping and recreating the table, or by using TRUNCATE
. Emptying a table with DELETE *
is less efficient and should not be used to empty a table.
VACUUM
can take significant time. In situations where data is being appended in time-order and the table's SORTKEY
is based on the time, it is not necessary to vacuum the table. This is because the table is effectively sorted already. This, however, does not apply to interleaved sorts.
Interleaved sorts are more tricky. From the sort key documentation:
An interleaved sort key gives equal weight to each column in the sort key, so query predicates can use any subset of the columns that make up the sort key, in any order.
Basically, interleaved sorts use a fancy algorithm to sort the data so that queries based on any of the columns (individually or in combination) will minimize the number of data blocks that are required to be read from disk. Disk access always takes the most time in a database, so minimizing disk access is the best way to speed-up the database. Amazon Redshift uses Zone Maps to identify which blocks to read from disk and the best way to minimize such access is to sort data and then skip over as many blocks as possible when performing queries.
Interleaved sorts are less performant than normal sorts, but give the benefit that multiple fields are fairly well sorted. Only use interleaved sorts if you often query on many different fields. The overhead in maintaining an interleaved sort (via VACUUM REINDEX
) is quite high and should only be done if the reindex effort is worth the result.
So, in summary:
ANALYZE
after significant data changes
VACUUM
regularly if you delete data from the table
VACUUM REINDEX
if you use Interleaved Sorts and significant amounts of data have changed