3
votes

I have read in a few places that there are 3 levels of caching in Snowflake:

  1. Metadata cache. Maintained in the Global Service Layer. This includes metadata relating to micro-partitions such as the minimum and maximum values in a column, number of distinct values in a column. This enables queries such as SELECT MIN(col) FROM table to return without the need for a virtual warehouse, as the metadata is cached.

  2. Query Result Cache. This is also maintained by the global services layer, and holds the results set from queries for 24 hours (which is extended by 24 hours if the same query is run within this period).

  3. Warehouse data cache. This is maintained by the query processing layer in locally attached storage (typically SSDs) and contains micro-partitions extracted from the storage layer.

https://www.linkedin.com/pulse/caching-snowflake-one-minute-arangaperumal-govindsamy/

Then I also read in the Snowflake documentation that these caches exist:

  1. Result Cache: This holds the results of every query executed in the past 24 hours. These are available across virtual warehouses, so query results returned to one user is available to any other user on the system who executes the same query, provided the underlying data has not changed.

  2. Local Disk Cache. This is used to cache data used by SQL queries. Whenever data is needed for a given query it's retrieved from the Remote Disk storage, and cached in SSD and memory.

  3. Remote Disk Cache. This holds the long term storage. This level is responsible for data resilience, which in the case of Amazon Web Services, means 99.999999999% durability. Even in the event of an entire data centre failure.

https://community.snowflake.com/s/article/Caching-in-Snowflake-Data-Warehouse

What is the correspondence between these ? Both have the Query Result Cache, but why isn't the metadata cache mentioned in the snowflake docs ? And is the Remote Disk cache mentioned in the snowflake docs included in Warehouse Data Cache (I don't think it should be.

So are there really 4 types of cache in Snowflake?:

  1. Metadata Cache
  2. Query Result Cache
  3. Local Disk Cache
  4. Remote Disk Cache
2

2 Answers

4
votes

Metadata cache - The Cloud Services layer does hold a metadata cache but it is used mainly during compilation and for SHOW commands. It's a in memory cache and gets cold once a new release is deployed. The other caches are already explained in the community article you pointed out.

4
votes

Snwoflake has 3 types of caches:

  1. Metadata cache

  2. Results cache

  3. Warehouse cache has the synonyms:

    • Raw Data Cache
    • SSD Cache
    • Data Cache
    • Local (Disk)

"Remote (Disk)" is not the cache but Long term centralized storage.