AWS Data Pipeline restore to DynamoDB table error with “is in status 'CANCELLED' with reason 'Job terminated' ”

Question

I have configured a AWS Data Pipeline to export a DynamoDB table to a S3 bucket (using the template) in a different account. That export is working just fine, but I have some issues when I try to restore the backup into the new table in that second account (Using the import template too).

My source of information for this task: https://aws.amazon.com/premiumsupport/knowledge-center/data-pipeline-account-access-dynamodb-s3/

I can see that the AWS Data Pipeline is restoring data to the new table (not sure if all the data is being restored) however the execution has status CANCELED.
The activity log shows several times this: EMR job '@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' with jobFlowId 'j-11620944P11II' is in status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' is in status 'RUNNING' with reason 'null',

2.a) then the canceled part EMR job '@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' with jobFlowId 'j-11620944P11II' is in status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' is in status 'CANCELLED' with reason 'Job terminated'

See full log below (just left few lines with error on point #2):

07 Sep 2020 12:52:04,844 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.taskrunner.TaskPoller: Executing: amazonaws.datapipeline.activity.EmrActivity@1d0415c
07 Sep 2020 12:52:04,887 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 private.com.amazonaws.services.datapipeline.factory.S3ClientFactory: Returning cached AmazonS3Client for the region [eu-west-1]
07 Sep 2020 12:52:04,945 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.EmrActivity: EMR transform starting.
07 Sep 2020 12:52:04,957 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrClient: EMR client waiting for cluster to enter ready state for jobflow id 'j-11620944P11II'.
07 Sep 2020 12:52:04,957 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrClient: EMR client checking if cluster is ready for jobflow with id 'j-11620944P11II'.
07 Sep 2020 12:52:05,141 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrClient: EMR client reports that cluster with jobflow id 'j-11620944P11II' is ready.
07 Sep 2020 12:52:05,200 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrClient: EMR client adding steps with request '{JobFlowId: j-11620944P11II,Steps: [{Name: df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1,ActionOnFailure: CONTINUE,HadoopJarStep: {Properties: [],Jar: s3://dynamodb-dpl-eu-west-1/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,Args: [org.apache.hadoop.dynamodb.tools.DynamoDBImport, s3://dynamodb-backup-imported/2020-09-06-12-19-11/, blabla-test6, 0.25]}}]}'
07 Sep 2020 12:53:05,352 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: EMR job '@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' with jobFlowId 'j-11620944P11II' is in  status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' is in status 'RUNNING' with reason 'null'
07 Sep 2020 13:48:08,772 [WARN] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: EMR job flow named 'df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59' with jobFlowId 'j-11620944P11II' is in status 'WAITING' because of the step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' failures 'Job terminated'
07 Sep 2020 13:48:08,772 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: EMR job '@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' with jobFlowId 'j-11620944P11II' is in  status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' is in status 'CANCELLED' with reason 'Job terminated'
07 Sep 2020 13:48:08,772 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: Collecting steps stderr logs for cluster with AMI null
07 Sep 2020 13:48:08,777 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.taskrunner.LogMessageUtil: Returning tail errorMsg :
07 Sep 2020 13:48:08,777 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: Collecting steps logs for cluster with AMI/ReleaseLabel emr-5.23.0
07 Sep 2020 13:48:08,778 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelperFactory: Getting the helper for version 2.8.3
07 Sep 2020 13:48:08,778 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Uploading step log details
07 Sep 2020 13:48:08,778 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: path to step logss3n://srh-data-export-int2/df-06812232H5PDR4VVK472/EmrClusterForLoad/@EmrClusterForLoad_2020-09-07T12:45:59/@EmrClusterForLoad_2020-09-07T12:45:59_Attempt=1/j-11620944P11II/steps
07 Sep 2020 13:48:08,778 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: step log file /mnt/taskRunner/output/logs/df-06812232H5PDR4VVK472/TableLoadActivity/@TableLoadActivity_2020-09-07T12:45:59/@TableLoadActivity_2020-09-07T12:45:59_Attempt=1/hadoop.jobs.log
07 Sep 2020 13:48:08,782 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Done uploading hadoop log details
07 Sep 2020 13:48:08,842 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Field value updated 
07 Sep 2020 13:48:08,842 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Done updating the field with value 
07 Sep 2020 13:48:08,844 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.taskrunner.HeartBeatService: Finished waiting for heartbeat thread @TableLoadActivity_2020-09-07T12:45:59_Attempt=1
07 Sep 2020 13:48:08,845 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.taskrunner.TaskPoller: Work EmrActivity took 56:4 to complete

If I go to the dependency "EmrClusterForLoad" I see this:

@failureReason Resource timeout due to terminateAfter configuration
@status TIMEDOUT

My activity is the one on the image, and the step field has this config

s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBImport,#{input.directoryPath},#{output.tableName},#{output.writeThroughputPercent}

Resources config:

I would like to understand what I am missing in here, and how I can solve it, or whether it is bug, because I see some data in the new table, so the restore is at least partially ok, but this from the log: status 'WAITING' and reason 'Cluster ready after last step completed and then being canceled means that perhaps the restored is not fully completed.

Then I read somewhere that this error message @failureReason Resource timeout due to terminateAfter configuration is a matter of adding an optional field called terminateAfter, which is not available in my Architect view.

farp332 farp332 · Accepted Answer · 2020-09-22T09:41:49

ANSWERING TO MYSELF

The issue was that I set the Terminate After field because I got the warning message on the image below which suggested to do that, so I set Terminate After 1 hour and the reason I used that time is because the file to be imported was only 9,6 MB. How much time does it need to process such as small file?.

So the import process of that small file last about 5 hours.

Findings:

To improve the import time I increased the myDDBWriteThroughputRatiovalue to 0.95 from 0.25, At the beginning I didn't touch that parameter because it was the default value from the template, and AWS documentation sometimes simplify a lot stuff that in many cases you have to discover by trial and error.

After changing that value the import last about an hour, which is way better that 5 hours but still slow, because we are talking about only 9,6 MB

Then this that I saw in the logs is in status 'WAITING' and reason 'Cluster ready after last step completed.' which worried me a bit due to I'm new to using this tool and I didn't quite get message, is simply the following as explained ny someone from AWS

*

If you see that the EMR Cluster is in Waiting, Cluster ready after last steps, it means that cluster had executed the first request it has received and is waiting to execute the next request/activity on the cluster.

These are all my findings, hopefully this will helps someone else.

AWS Data Pipeline restore to DynamoDB table error with “is in status 'CANCELLED' with reason 'Job terminated' ”

2 Answers