Going through documentation I did find cross-region replication for storage layer, but not about compute layer of Snowflake. I did not see any mentions about availability options for Virtual Warehouses. In case whole AWS Region goes down, database will still be available for serving queries, but what about virtual warehouse? do I need to create a new one in case region is still down or is there a way to have a "back-up" virtual warehouse in different AWS region?
3 Answers
Virtual warehouse is a essentially a compute server (for example, AWS EC2 if hosted on AWS). Virtual warehouses are not persistent, i.e. when you suspend a warehouse, it is returned to the AWS/Azure/GCP pool and when you resume, it is allocated from the pool. When a cloud region goes down, virtual warehouses will be allocated and created from AWS/Azure/GCP pool in the backup region.
The documentation here states clearly what can, and what can't, be replicated: Snowflake Replication
For example, it states:
Currently, replication is supported for databases only. Other types of objects in an account cannot be replicated. This list includes:
- Users
- Roles
- Warehouses
- Resource monitors
- Shares
When you set up an environment (into which you are going to replicate a database from another account) you also need to set up the roles, users, warehouses, etc.
The 'The Snowflake Elastic Data Warehouse'(2016) paper at the paragraph '4.2.1 Fault Resilience' reports:
Virtual Warehouses (VWs) are not distributed across AZs. This choice is for performance reasons. High network throughput is critical for distributed query execution, and network throughput is significantly higher within the same AZ. If one of the worker nodes fails during query execution, the query fails but is transparently re-executed, either with the node immediately replaced, or with a temporarily reduced number of nodes. To accelerate node replacement, Snowflake maintains a small pool of standby nodes. (These nodes are also used for fast VW provisioning.) If an entire AZ becomes unavailable though, all queries running on a given VW of that AZ will fail, and the user needs to actively re-provision the VW in a different AZ. With full-AZ failures being truly catastrophic and exceedingly rare events, we today accept this one scenario of partial system unavailability, but hope to address it in the future.