I have a lambda function in Python that makes some stuff and returns some values that need to be inserted in Redshift. In the lambda, I push the values to Kinesis, which copy them in S3 and after that in Redshift.
The values in lambda are obtained in a string as follows:
final_string = 'a;b;d;c'
Each letter is the value for a different column in the table in Redshift, hence the delimeter is ';'. Then I push the data to Kinesis Stream with:
put_response = kinesis_client.put_record(StreamName = 'PixelTrack',
Data=json.dumps(final_string),
PartitionKey='first')
Then, the kinesis stream feeds a Kinesis Firehose Stream. The file that is generated in S3 with Kinesis Firehose is like (including quotes in the file):
"a;b;c;d;c"
And finally I copy the data to redshift with the following statement (configured in Kinesis firehose):
copy table
from blabla
BLANKSASNULL
DELIMITER ';'
EMPTYASNULL
NULL AS 'null'
ESCAPE
FILLRECORD;
I have managed to make it work and getting the values in Redshift when just one result is buffered in Kinesis (creating a new column in Redshift, though). So, when just one lambda have been executed during the buffer time, Redshift table looks like:
A B C D no_info_column
"a b c d" <null>
The problem comes when I execute the lambda several times, because I get a file in S3 with the following text in it:
"a,b,c,d" "a1,b1,c1,d1"
And I get in Redshift the error Extra column(s) found
because the copy statement is unable to find the rows separation.
I have tried the following stuff without success:
- Returning the string in lambda
- Searching how to set a row delimeter in copy (SO question)
- Converting a list to json instead of a string. Then I had problems with the brackets opening the list
- Using REMOVEQUOTES in copy statement
My original question were going to be: "How to copy from s3 to redshift with different rows separated by double quotes", but maybe the problem is in my first approach or whatever, so I have decided to make the question a little bit more broad.
So, how could I solve this?