1
votes

(Submitting for a Snowflake User, hoping to receive additional assistance)


Is there another way to perform table insertion using a stored procedure faster? I started building a usp with the purpose to insert million or so of rows of test data into a table for the purpose of load testing.

I got to this stage show below and set the iteration value to 10,000.

This took over 10 mins to iterate 10,000 times to insert a single integer into a table each iteration

Yes - I am using a XS data warehouse, but even if this is increased to MAX - this is way to slow to be of any use.

--build a test table

CREATE OR REPLACE TABLE myTable
(
   myInt NUMERIC(18,0)
);

--testing a js usp using a while statement with the intention to insert multiple rows into a table (Millions) for load testing

CREATE OR REPLACE PROCEDURE usp_LoadTable_test()
RETURNS float
LANGUAGE javascript
EXECUTE AS OWNER
AS
$$
//set the number of iterations
   var maxLoops = 10;
//set the row Pointer
   var rowPointer = 1;
//set the Insert sql statement
   var sql_insert = 'INSERT INTO myTable VALUES(:1);';   
//Insert the fist Value
       sf_startInt = rowPointer + 1000;
       resultSet = snowflake.execute( {sqlText: sql_insert, binds: [sf_startInt] });
//Loop thorugh to insert all other values
   while (rowPointer < maxLoops)
   {
       rowPointer += 1;
       sf_startInt = rowPointer + 1000;
       resultSet = snowflake.execute( {sqlText: sql_insert, binds: [sf_startInt] });
   }

   return rowPointer;
$$;
CALL usp_LoadTable_test();

So far, I've received the following recommendations:

Recommendation #1

One thing you can do is to use a "feeder table" containing 1000 or more rows instead of INSERT ... VALUES, eg:

INSERT INTO myTable SELECT <some transformation of columns> FROM "feeder table"

Recommendation #2

When you perform a million single row inserts, you consume one million micropartitions - each 16MB.

That 16 TB chunk of storage might be visible on your Snowflake bill ... Normal tables are retained for 7 days minimum after drop.

To optimize storage, you could define a clustering key and load the table in ascending order with each chunk filling up as much of a micropartition as possible.


Recommendation #3

Use data generation functions that work very fast if you need sequential integers: https://docs.snowflake.net/manuals/sql-reference/functions/seq1.html


Any other ideas?

1

1 Answers

1
votes

This question was also asked at the Snowflake Lodge some weeks ago. Given the answers you received, do you still feel unanswered, then maybe hint about why?

If you just want a table with a single column of sequence numbers, use GENERATOR() as in #3 above. Otherwise, if you want more advice, share your specific requirements.