2
votes

In SnowFlake, would there be any option to execute the sql statements in parallel inside a Stored Procedure. I've a Stored Procedure (shown below), which has 35 sql statements, which run sequentially. We are planning to reduce the time , thinking of executing all these in parallel.

What would be the best approach to achieve this? (All I could think of is create 35 Stored Procedures and call all of them from a scheduler same time). Wanted to check if there would be any better approach of any SnowFlake feature to achieve this?

create or replace procedure SP_TEST()
  returns string
  language javascript
  execute as CALLER
  as
  $$
    try {
      var step_num = 0
    step_num = 0
    step_num++ //--#1
       var sql_statement1   = snowflake.createStatement( {sqlText: `INSERT INTO TGT_TBL select * from TBL_A`} )
       var sql_statement1_execute = sql_statement1.execute() 
        step_num++ //--#1
      var sql_statement2   = snowflake.createStatement( {sqlText: `INSERT INTO TGT_TBL select * from TBL_B`} )
      var sql_statement2_execute = sql_statement2.execute() 
      return  "Completed "+step_num+" steps to load into TGT_TBL"
     }
   catch (err) {
        throw(err)
    }
  $$
;
3

3 Answers

2
votes

Actually it's sequential because you are running the execute()-method for every statement.

To answer your question directly: Achieving parallelization within the single procedure is not possible and there is no feature or specific method as far as I know.

But in general: Increasing your warehouse-size is a good way of saving some time as the queries may run faster. It's not parallel but faster.

Scaling-out (i.e. using multi cluster warehouse) doesn't provide benefits in my opinion when you use one single procedure.

2
votes

As an alternative to your idea about creating 35 tasks, you can consider creating one task to call a stored procedure which will create 35 tasks to call these individual queries. If the query SQLs can be read from a table, or some part of them can be generated by JavaScript (such as table names with dates, ie CONNECT20200120 etc) then it could be a good automation.

0
votes

split your SP into several ones (which should run in parallel) and use SF dependent tasks for diff parts runs after main part is done.