(Submitting for a Snowflake User, hoping to receive additional assistance)
Is there another way to perform table insertion using a stored procedure faster? I started building a usp with the purpose to insert million or so of rows of test data into a table for the purpose of load testing.
I got to this stage show below and set the iteration value to 10,000.
This took over 10 mins to iterate 10,000 times to insert a single integer into a table each iteration
Yes - I am using a XS data warehouse, but even if this is increased to MAX - this is way to slow to be of any use.
--build a test table
CREATE OR REPLACE TABLE myTable
(
myInt NUMERIC(18,0)
);
--testing a js usp using a while statement with the intention to insert multiple rows into a table (Millions) for load testing
CREATE OR REPLACE PROCEDURE usp_LoadTable_test()
RETURNS float
LANGUAGE javascript
EXECUTE AS OWNER
AS
$$
//set the number of iterations
var maxLoops = 10;
//set the row Pointer
var rowPointer = 1;
//set the Insert sql statement
var sql_insert = 'INSERT INTO myTable VALUES(:1);';
//Insert the fist Value
sf_startInt = rowPointer + 1000;
resultSet = snowflake.execute( {sqlText: sql_insert, binds: [sf_startInt] });
//Loop thorugh to insert all other values
while (rowPointer < maxLoops)
{
rowPointer += 1;
sf_startInt = rowPointer + 1000;
resultSet = snowflake.execute( {sqlText: sql_insert, binds: [sf_startInt] });
}
return rowPointer;
$$;
CALL usp_LoadTable_test();
So far, I've received the following recommendations:
Recommendation #1
One thing you can do is to use a "feeder table" containing 1000 or more rows instead of INSERT ... VALUES, eg:
INSERT INTO myTable SELECT <some transformation of columns> FROM "feeder table"
Recommendation #2
When you perform a million single row inserts, you consume one million micropartitions - each 16MB.
That 16 TB chunk of storage might be visible on your Snowflake bill ... Normal tables are retained for 7 days minimum after drop.
To optimize storage, you could define a clustering key and load the table in ascending order with each chunk filling up as much of a micropartition as possible.
Recommendation #3
Use data generation functions that work very fast if you need sequential integers: https://docs.snowflake.net/manuals/sql-reference/functions/seq1.html
Any other ideas?