0
votes

Snowflake is basically a cloud data warehouse. The basic unit of work in Snowflake is query. To monitor the load, we need to monitor the query load at a certain interval.

Two Available Options:

1.  INFORMATION_SCHEMA Schema
2.  ACCOUNT_USAGE Schema

Both these schema contains the QUERY_HISTORY view. Here the monitoring dashboard can be created by using either of the schema. But we need to understand the limitations of these schema, below are our findings

QUERY_HISTORY VIEW IN:

INFORMATION_SCHEMA: It contains only 10,000 records. In this case, one can't predict the result data belongs to the desired duration or not. As the 10K rows can be inserted as per the traffic at the time. In this case the INFORMATION_SCHEMA results can be inconsistent. It contains the data for 14 days interval OR 10K records.

ACCOUNT_USAGE : It contains the data for 1 year, with delay of 45 minutes.

Conclusion: In case, the required data analysis is continuous than only use the INFORMATION_SCHEMA.

In case, the required data analysis can accommodate delay of 45 minutes than the ACCOUNT_USAGE schema should be used.

1

1 Answers

1
votes

Using an in-house tool, we poll INFORMATION_SCHEMA on a 10 minute resolution, I think we might even keep polling if the current block does not content the current, and add first time in the batch if we are getting 10K results back.

We then throw this into InfluxDB so we can graph and alert of this. The major down side, is the query to get the history is itself a query so if your experiencing queue times (and are using the same warehouse to are also have latency) and a older problem, if you saturate your account connection limits, you also cannot get in, to see what's saturating the account, and in that state you also cannot log into the UI. To get around this we put limits on all our tooling, so don't hit the limits.

We are currently doing ~300-400K queries a day, and the above methods are working well for us.