0
votes

This might be more of snowflake knowledge question rather than issue. I am running a copy command from s3 to snowflake. and i see it took 30 min to get 100 the byte scanned however even after having byte scanned to 100 percent it took another 40 min for the query to be completed.

Can somebody please explain whats going on here, as this way i feel it is tough to estimate how much time any running copy command might take while looking into the history screen.

2
Can you provide the COPY INTO command, the size of the warehouse, the size of the files, and the number of files being loaded? All of these are factors in loading data into Snowflake.Mike Walton

2 Answers

3
votes

Sounds like you're referring to the 100% in the Bytes Scanned column of the query profile. If you have transformations in your COPY INTO command this will take additional time to process. As others have mentioned, the size of the warehouse will have an impact as the warehouse size will determine the number of cores and threads, which directly affect the parallelism of the writes.

In short the Bytes Scanned is just a measure of the total data read by Snowflake that will be processed by the job, but it still needs to process the job.

1
votes

We in the past we have found each xsmall can load 40mb/s from S3, and thus a small can load 2x. So that's our base line expectation of load speeds.

What can legitimately slow down a copy is if you are coping from the root of the bucket s3://buck_name/ but there are millions of file in the that directory, with only one new 100 byte file. But I suspect that is also not the case.

The next thing it might be is failure to run the query part, which in the profile would have multiple profile stage tabs of the likes of 1 \ 1001 \ 2002 which the increment on the stage in the thousands indicating the query failed to execute and that it was re-run. This sometimes can be due to the warehouse get corrupted, and sometime due to the new run-time of the current release failing, and the retry's can be running on older releases to see if those succeed. But there are often clues to some of this, with time being seen "spilling to internal/external storage" is something we have seen when bugs occurs.

But in reality if things are seeming "really" strange, I would open a support ticket, and ask for an explanation of what is happening. With the usual, this is what I am seeing, this is why I think it's strange..