We have customer data that is sharded by a company ID. That is, no companies data would ever mix with another companies data so this was chosen as the distkey.
Should the company ID be the first column in the sortkey given that a node may contain several thousand companies? Or does the distkey already limit the data to a given company before it starts scanning?
SELECT COUNT(*) FROM sales WHERE company_id = 123it will know which node to run the query, but then will it need to scan the whole node to find the records (so it should be in the sortkey), or is the data segmented on the node into individual company_id's (sortkey is not needed)? - Elliot Chance