(Submitting on behalf of a Snowflake User)
QUESTION: Why would the filter or the search key(key used in where clause) would be a better choice for cluster key than an order by or group by key.
One resource recommends reading: https://support.snowflake.net/s/article/case-study-how-clustering-can-improve-your-query-performance
Another resource mentions:
The performance of query filter will be better because the data is sorted it would skip all the rows which are not required.
For the scenario which has query filter on columns which are not part of sort order but the columns in group by and order by are part of data sort order (clustered keys), it may take time to select those data but the sorting would be easy since the data is already in an order.
A 3rd resource states:
The clustering key is important for the WHERE clause when you only select a small portion of the overall data that you have in your tables, because it can reduce the amount of data that has to be read from the Storage into the Compute when the Optimizer can use the clustering key for Query Pruning.
You can alternatively use the clustering key to optimize table inserts and possibly also query output (eg sort order).
Your choice should depend on your priorities, there is no cure all unless a single key covers all above.
To which the User responds with the following questions:
If I always insert the rows in the order in which they will be retrieved, do I still need to create a cluster key? For example if a table is always queried using a date_timestamp and if I ensure that I am inserting in the table order by date_timestamp, do I still need to create a cluster key on date_timestamp?
Any thoughts, recommendations, etc.? Thanks!