AWS Data Pipeline scheduling with expressions and date functions

Question

I want to schedule a AWS Data Pipeline job hourly. I would like to create hourly partition on S3 using this. Something like:

s3://my-bucket/2016/07/19/09/
s3://my-bucket/2016/07/19/10/
s3://my-bucket/2016/07/19/11/

I am using expressions for my EMRActivity for this:

s3://my-bucket/#{year(minusHours(@scheduledStartTime,1))}/#{month(minusHours(@scheduledStartTime,1))}/#{day(minusHours(@scheduledStartTime,1))}/#{hour(minusHours(@scheduledStartTime,1))}

However, hour and month functions give me data such as 7 for July instead of 07, and 3 for 3rd hour instead of 03. I would like to get hours,months and hours with 0 appended (when required)

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-pipeline-reference-functions-datetime.html

Ruchir Bhargava Ruchir Bhargava · Accepted Answer · 2016-08-10T19:04:35

You can use the format function to get hours/months in the format you want.

#{format(myDateTime,'YYYY-MM-dd hh:mm:ss')}

Refer to the link for more details: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-pipeline-reference-functions-datetime.html

In your case, to display hour with 0 appended this should work:

#{format(minusHours(@scheduledStartTime,1), 'hh')}

you can replace 'hh' with 'MM' to get months with 0 appended.

AWS Data Pipeline scheduling with expressions and date functions

1 Answers