0
votes

We plan to import OLTP Relational tables into AWS Redshift. The CustomerTransaction table joins to multiple lookup tables. I only included 3, but we have more.

What should Sort Key be on Customer Transaction Table? In regular SQL server, we have nonclustered indexes on the foreign keys in CustomerTransaction table. For AWS Redshift, Should I use compound sort keys or interleaved sort on foreign key columns in CustomerTransaction? What is the best indexing strategy for this table design. Thanks,

create table.dbo CustomerTransaction
{
    CustomerTransactionId bigint primary key identity(1,1),
    ProductTypeId bigint,   -- foreign keys to Product Type Table
    StatusTypeID bigint         -- Foreign keys to StatusTypeTable
    DateOfPurchase date,
    PurchaseAmount float,
    ....
}

create table dbo.ProductType
{
    CustomerTransactionId bigint primary key identity(1,1),
    ProductName varchar(255),
    ProductDescription varchar(255)
    .....
}

create table dbo.StatusType
{
    StatusTypeId bigint primary key identity(1,1),
    StatusTypeName varchar(255),
    StatusDescription varchar(255)
    .....

}

1

1 Answers

3
votes

The general rules of thumb are:

  • Set the DISTKEY based on what you commonly GROUP BY
  • Set the SORTKEY based on what you commonly use in WHERE statements
  • Avoid Interleaved Sort Keys (they are only optimal in rare circumstances and require frequent VACUUM)

From Choose the Best Distribution Style - Amazon Redshift:

  • Distribute the fact table and one dimension table on their common columns
  • Choose the largest dimension based on the size of the filtered data set
  • Choose a column with high cardinality in the filtered result set
  • Change some dimension tables to use ALL distribution

So, it is not easy to recommend a particular DISTKEY and SORTKEY because it depends on how you use the tales. Merely seeing the DDL is not sufficient to recommend the best way to optimize the tables.

Other references: