14
votes

In the Snowflake documentation, I could not find a reference to using Indexes.

Does Snowflake support Indexes and, if not, what is the alternative approach to performance tuning when using Snowflake?

5

5 Answers

20
votes

Snowflake does not use indexes. This is one of the things that makes Snowflake scale so well for arbitrary queries. Instead, Snowflake calculates statistics about columns and records in files that you load, and uses those statistics to figure out what parts of what tables/records to actually load to execute a query. It also uses a columnar store file format, that lets it only read the parts of the table that contain the fields (columns) you actually use, and thus cut down on I/O on columns that you don't use in the query.

Snowflake slices big tables (gigabyte, terabyte or larger) into smaller "micro partitions." For each micro partition, it collects statistics about what value ranges each column contains. Then, it only loads micro partitions that contain values in the range needed by your query. As an example, let's say you have a column of time stamps. If your query asks for data between June 1 and July 1, then partitions that do not contain any data in this range, will not be loaded or processed, based on the statistics stored for dates in the micropartition files.

Indexes are often used for online transaction processing, because they accelerate workflows when you work with one or a few records, but when you run analytics queries on large datasets, you almost always work with large subsets of each table in your joins and aggregates. The storage mechanism, with automatic statistics, automatically accelerates such large queries, with no need for you to specify an index, or tune any kind of parameters.

19
votes

Snowflake does not support indexes, though it does support "clustering" for performance improvements of I/O.

I recommend reading these links to get familiar with this:

https://docs.snowflake.net/manuals/user-guide/tables-clustering-keys.html

https://docs.snowflake.net/manuals/user-guide/tables-auto-reclustering.html

Here's a really good blog post on the topic as well: https://www.snowflake.com/blog/automatic-query-optimization-no-tuning/

Hope this helps...Rich

6
votes

No Snowflake does not have indexes. Its performance boosts come through by eliminating unnecessary scanning which it achieves my maintaining rich metadata in each of its micro partitions. For instance if you have a time filter in your query and your table is more or less sorted by time, then Snowflake can "prune" away the parts of the table that are not relevant to the query.

Having said this, Snowflake is constantly releasing new features and one such feature is its Search Optimisation Service which allows you to perform "needle in a hay stack" queries on selected columns that you enable. Not quite indexes that you can create, but something like that being used behind the scenes perhaps.

4
votes

No, Snowflake doesn't support indexes. And don't let them tell you that this is an advantage. Performance tuning can be done as described above, but is often is done with money: Pay for bigger warehouses.

2
votes

Snowflake doesn't support indexes, it keeps data in micro partition or in another sense it breaks data sets in small files and format rows to column and compress them. Snowflake metadata manager in service layer will have all the information about each micro partition like which partition have which data. Each partition will have information about itself in header like max value, min value, cardinality etc. this is much better then indexes as compare to conventional databases.