3
votes

I'm working on the data pipeline. In one of the steps CSV from S3 is consumed by RedShift DataNode. My RedShift table has 78 columns. Checked with:

SELECT COUNT(*) FROM information_schema.columns WHERE table_name = 'my_table';

After failed RedshiftCopyActivity 'stl_load_errors' table shows "Delimiter not found" (1214) error for line number 1, for column namespace (this is second column, varchar(255)) on position 0. Consumed CSV line looks like that:

0,my.namespace.string,2119652,458031,S,60,2015-05-02,2015-05-02 14:51:02,2015-05-02 14:51:14.0,1,Counter,1,Counter 01,91,Chaymae,0,,,,227817,1,Dine In,5788,2015-05-02 14:51:02,2015-05-02 14:51:27,17.45,0.00,0.00,17.45,,91,Chaymae,0,0.00,12,M,A,-1,13,F,0,0,2,2.50,F,1094055,Coleslaw Md Upt,8,Sonstige,900,Sides,901,Sides,0.00,0.00,0,,,0.0000,0,0,,,0.00,0.0000,0.0000,0,,,0.00,0.0000,,1,Woche Counter,127,Coleslaw Md Upt,2,2.50

After simple replacement ("," to "\n") I have 78 lines so it looks like the data should be matched... I'm stuck on that. Maybe someone knows how I can find more information about the error or see the solution?

EDIT

Query:

select d.query, substring(d.filename,14,20), 
d.line_number as line, 
substring(d.value,1,16) as value,
substring(le.err_reason,1,48) as err_reason
from stl_loaderror_detail d, stl_load_errors le
where d.query = le.query
and d.query = pg_last_copy_id(); 

results with 0 rows.

2

2 Answers

5
votes

I figured it out and maybe it will be useful for someone else:

There were in fact two problems.

  1. My first field in the redshift table was of the type INT IDENTITY(1,1) and in CSV I had 0 value there. After removing the first column from CSV, even without specified columns mapping everything was copied without a problem if...
  2. DELIMITER ',' commandOption was added to S3ToRedshiftCopyActivity to force using comma. Without it RedShift recognized dot from namespace (my.namespace.string) as delimiter.
0
votes

You need to add FORMAT AS JSON 's3://yourbucketname/aJsonPathFile.txt'. AWS has not mentioned this already. Please note that this is only work when your data is in json form like

{'attr1': 'val1', 'attr2': 'val2'} {'attr1': 'val1', 'attr2': 'val2'} {'attr1': 'val1', 'attr2': 'val2'} {'attr1': 'val1', 'attr2': 'val2'}