Problem summary
I use a COPY query to load data from S3 to a Redshift table t1.
99% of the time there's no errors, data is correctly loaded. So I know there's no mistake on the loaded table name.
But when there are errors, depending on the table I loaded, I can't always correctly track which error happened during loading of this specific table because, for some loaded tables like t1, the table identifier tbl found in stl_load_errors doesn't correspond to the one expected for table t1.
I filter stl_load_errors both on filename and tbl, because a same S3 file may be loaded to different tables, so filtering on filename only isn't a safe way to check if there were errors during one specific COPY query.
Details
The loaded table name doesn't appear in this table, instead, we get an integer identifier 1234567 in the column tbl. We have to join with another table stv_tbl_perm that contains both name and identifier, to get the name. This "trick" is shown in the Redshift documentation.
For some tables, like t1, the error row I may found in stl_load_errors after using COPY t1 ... will have an identifier 1234567 in tbl column that doesn't correspond to anything found in stv_tbl_perm. As if it was the identifier of a temporary table.
Then, when I look in stv_tbl_perm for"name"='t1', I do find an identifier 10111213 in column id, but it's not the one I saw in stl_load_errors.
What makes it even more confusing is that my method of filtering errors on file and table, works perfectly fine for some Redshift tables, the id found is the one expected, it matches with the correct name in stv_tbl_perm.
Why would my COPY to a table go through what seems like a temporary table for some table, and not for others?
Reproduction
I tried to simplify my process as much as possible for sharing and managed to reproduce my issue as follows.
1. Create table, insert rows so it's not empty and check it.
CREATE TABLE IF NOT EXISTS public.t1
(
id INTEGER,
name VARCHAR(36),
price NUMERIC(6,2)
);
insert into t1 VALUES (1, 'paul', 10.50);
select * from t1;
2. Create an S3 file with a deliberate type error
I write a character in the column "id" so it will cause a type error during loading.
unload ('select ''a'' as "id", ''pierre'' as "name", 2.50 as "price"')
to 's3://my-bucket/redshift-load-error-table-id/unload/t1_'
iam_role 'arn:aws:iam::1111111111:role/my-user'
parallel off
delimiter ',';
Make sure to replace my-bucket, 1111111111 and my-user with your own values.
3. Load this file into the table using COPY
COPY t1
from 's3://my-bucket/redshift-load-error-table-id/unload/t1_000'
iam_role 'arn:aws:iam::1111111111:role/my-user'
CSV;
Query fails as expected, generating a row in table stl_load_errors.
4. Check for the table id in the new error row in stl_load_errors
SELECT *
FROM stl_load_errors
where trim(filename) = 's3://my-bucket/redshift-load-error-table-id/unload/t1_000';
We find the row corresponding to the file and keep the table identifier found in the column tbl, let's say "tbl" = 1234567 (you'll get something different).
5. Look up this table identifier in stv_tbl_perm to get the table name
select *
from stv_tbl_perm
where id='1234567';
This returns nothing, the identifier found doesn't correspond to any permanent table listed in stv_tbl_perm.
As if it was the identifier of a temporary table.
6. Look up the expected identifier of the table we loaded to
select *
from stv_tbl_perm
where name='t1';
This returns 10111213 (you'll get something different), and this is the identifier I should have gotten in the tbl column of table stl_load_errors.
If I try another COPY with an error, another line will appear in the error table, with again another identifier different from both previous identifier. This also indicates it could be the identifier of a temporary table.
Consequence
I therefore cannot filter stl_load_errors for the specific table I applied the COPY query on.
Surprisingly, this process works perfectly fine for some of my tables and I can't find any difference in the way I do it between the tables for which it works and the tables for which it doesn't.
I haven't found any reference to that issue on the web yet. It requires a lot of details so it's hard to search too.
Any idea?