0
votes

We are using data pipeline to push the data from s3 to the amazon redshift table. We are pushing the data using JSON

{
"jsonpaths": [
    "$['data1']",
    "$['data1']",
    "$['data1']['data1']",
    "$['data1']['data1']",
    "$['data1']['data1']",
    "$['data1']['data1']",
    "$['data1']",
    "$['data1']",
    "$['data1']",
    "$['data1']['data1']",
    "$['data1']['data1']"
 ]}

I recently want to add created_at in my redshift table which should take the current timestamp. I have added the column in my redshift table with default value to current timestamp. I have not defined in the json mapping file for this column (for obvious reasons). But the pipeline is failing stating that the no of columns and json mapping is not matching. Have tried reading multiple documents but have not got the proper solution for this. Can anyone help me on how can we insert default current timestamp to redshift table from s3 using aws data pipeline?

1

1 Answers

0
votes

There is option in copy command to specify columns you wish to populate data, that we use in our ETL, that populates any default value for skipped columns.

COPY my_tablename
    (C1  
    ,C2   
    ,C3,...    
  )
    FROM 's3://<<YOUR-BUCKET>>/<<YOUR-FILE>>'
    credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret- access-key>';

So I could apply same in case of pipeline like below, though I have not tested this solutions,

copy My_table(C1,C2,C3,...)
from 's3://mybucket/category_object_paths.json'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' 
json 's3://mybucket/category_jsonpath.json';

EDIT After Comment

There is transformSql option that you can use RedshiftCopyActivity to copy selective columns.

Here is reference, http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-redshiftcopyactivity.html