I am currently working with DBs in Snowflake, and I have a performance problem.
In my query, I need to retrieve 100 rows of data from a MAIN_TABLE which contains foreign keys to other tables. MAIN_TABLE size is between 50K-300K rows.
This MAIN_TABLE contains ids (foreign keys) of type NUMBER(38,0), and the description of those fields/columns is retrieved from other tables with a straightforward INNER JOIN.
The secondary tables (table_one, table_two...) are very small, all of them have less than 20 rows, except one, that has around 1500 rows (still very small)
The problem is:
Running the query, as shown below, takes an average of 1.7s to complete. If I delete all the DESC fields (retrieved using INNER JOINs), and all the INNER JOINs, the query takes an average of 300ms to complete.
The current solution my peers implemented:
They have cached the information of all the secondary tables in our app (made in Java, but that is irrelevant). For about 80% of the cases (see query below), that we do not need to filter by DESCription, this works, but on the other 20% of cases we still have to perform the INNER JOINs.
Downsizes of the solution:
Is not a bad solution, but it has two problems:
- (not so bad) it bloats our codebase, is basically a "JOIN" made in Java
- the 20% of the time we need to filter by DESCriptions, we still need to do the INNER JOINs, so we still have a performance problem.
One possibility that comes to mind is caching those tables in SF, but I haven't found a straightforward way to do it yet. Maybe there is a way to optimize the query, but I do not yet understand how Snowflake works internally, as far as I know it does not uses Indexes, at least not in the same way other platforms do.
So, is there a way to optimize the query in SF for 100% of the cases?
SELECT
main_table.ONE_ID,
main_table.TWO_ID,
main_table.THREE_ID,
main_table.FOUR_ID,
main_table.FIVE_ID,
main_table.SIX_ID,
main_table.SEVEN_ID,
field_one.ONE_DESC,
field_two.TWO_DESC,
field_three.THREE_DESC,
field_four.FOUR_DESC,
field_five.FIVE_DESC,
field_six.SIX_DESC,
field_seven.SEVEN_DESC
FROM
SOME_DATABASE.MAIN_TABLE AS main_table
INNER JOIN SOME_DATABASE.TABLE_ONE AS table_one ON main_table.field_one_id = table_one .ONE_ID
INNER JOIN SOME_DATABASE.TABLE_TWO AS table_two ON main_table.field_two_id = table_two .TWO_ID
INNER JOIN SOME_DATABASE.TABLE_THREE AS table_tree ON main_table.field_tree_id = table_tree .THREE_ID
INNER JOIN SOME_DATABASE.TABLE_FOUR AS table_four ON main_table.field_four_id = table_four .FOUR_ID
INNER JOIN SOME_DATABASE.TABLE_FIVE AS table_five ON main_table.field_five_id = table_five .FIVE_ID
INNER JOIN SOME_DATABASE.TABLE_SIX AS table_six ON main_table.field_six_id = table_six .SIX_ID
INNER JOIN SOME_DATABASE.TABLE_SEVEN AS table_seven ON main_table.field_seven_id = table_seven .SEVEN_ID
WHERE
main_table.ONE_ID IN (25, 26)
AND main_table.TWO_ID IN (10, 12)
AND main_table.THREE_ID IN (1, 2, 3)
AND main_table.FOUR_ID IN (2, 3)
AND main_table.FIVE_ID IN (3)
AND main_table.SEVEN_ID IN (1)
-- The following WHERE clauses are present in about 20% of the queries
AND table_one.ONE_DESC,
AND table_two.TWO_DESC,
AND table_three.THREE_DESC,
ORDER BY
main_table.ONE_ID,
main_table.TWO_ID,
main_table.THREE_ID
LIMIT
100 OFFSET 0
Small update:
I have been trying using WITH clauses, both wrapping the ids and the description, but there does not seem to be an improvement