We are ingesting data into Snowflake via the kafka connector. To increase the data read performance / scan less partitions we decided to add a clustering key to a a key / combination of keys stored in the RECORD_CONTENT variant field.
The data in the RECORD_CONTENT field looks like this:
{
"jsonSrc": {
"Integerfield": 1,
"SourceDateTime": "2020-06-30 05:33:08:345",
*REST_OF_THE_KEY_VALUE_PAIRS*
}
Now, the issue is that clustering on a datetime col like SourceDateTime does NOT work:
CLUSTER BY (to_date(RECORD_CONTENT:jsonSrc:loadDts::datetime))
...while clustering on a field like Integerfield DOES work:
CLUSTER BY (RECORD_CONTENT:jsonSrc:Integerfield::int )
Not working means: when using a filter on RECORD_CONTENT:jsonSrc:loadDts::datetime, it has no effect on the partitions scanned, while filtering on RECORD_CONTENT:jsonSrc:Integerfield::int does perform partition pruning.
What is wrong here? Is this a bug?
Note that:
- There is enough data to do meaningful clustering on RECORD_CONTENT:jsonSrc:loadDts::datetime
- I validated clustering on RECORD_CONTENT:jsonSrc:loadDts::datetime working by making a copy of the raw table, with RECORD_CONTENT:jsonSrc:loadDts::datetime in a seperate column loadDtsCol and then adding a similar clustering key on that column: to_date(loadDtsCol).