0
votes

We are trying to get backups of a DynamoDB table to S3 via AWS Data Pipeline. We are using the default template for this, provided by AWS (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part2.html). However, the job always fails with an error. Changing the EMR release doesn't change the error message.

Anyone knows what can cause this error:

31 May 2016 09:57:10,013 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.taskrunner.TaskPoller: Executing: amazonaws.datapipeline.activity.EmrActivity@523f31f2
31 May 2016 09:57:10,086 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.EmrActivity: EMR transform starting.
31 May 2016 09:57:10,093 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrClient: EMR client waiting for cluster to enter ready state for jobflow id 'j-2TUYGWQ1PYAHC'.
31 May 2016 09:57:10,094 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrClient: EMR client checking if cluster is ready for jobflow with id 'j-2TUYGWQ1PYAHC'.
31 May 2016 09:57:10,226 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrClient: EMR client reports that cluster with jobflow id 'j-2TUYGWQ1PYAHC' is ready.
31 May 2016 09:57:10,320 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrClient: EMR client adding steps with request '{JobFlowId: j-2TUYGWQ1PYAHC,Steps: [{Name: df-09387105FF7URCW5QOR_@TableBackupActivity_2016-05-30T12:58:18_Attempt=4,ActionOnFailure: CONTINUE,HadoopJarStep: {Properties: [],Jar: s3://dynamodb-emr-eu-west-1/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,Args: [org.apache.hadoop.dynamodb.tools.DynamoDbExport, s3://my-db-backup.dev01.rule//2016-05-30-12-58-18, my-db.dev01.rule, 0.25]}}]}'
31 May 2016 09:58:10,506 [WARN] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrUtil: EMR job flow named 'df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18' with jobFlowId 'j-2TUYGWQ1PYAHC' is in status 'WAITING' because of the step 'df-09387105FF7URCW5QOR_@TableBackupActivity_2016-05-30T12:58:18_Attempt=4' failures 'null'
31 May 2016 09:58:10,507 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrUtil: EMR job '@TableBackupActivity_2016-05-30T12:58:18_Attempt=4' with jobFlowId 'j-2TUYGWQ1PYAHC' is in  status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-09387105FF7URCW5QOR_@TableBackupActivity_2016-05-30T12:58:18_Attempt=4' is in status 'FAILED' with reason 'null'
31 May 2016 09:58:10,507 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrUtil: Collecting steps stderr logs for cluster with AMI 2.4.8
31 May 2016 09:58:10,517 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.taskrunner.LogMessageUtil: Returning tail errorMsg :Exception in thread "main" java.lang.NoClassDefFoundError: com/amazon/ws/emr/core/InstanceInfo
    at org.apache.hadoop.dynamodb.DynamoDBUtil.getDynamoDBEndpoint(DynamoDBUtil.java:268)
    at org.apache.hadoop.dynamodb.DynamoDBClient.initConfigurations(DynamoDBClient.java:369)
    at org.apache.hadoop.dynamodb.DynamoDBClient.<init>(DynamoDBClient.java:88)
    at org.apache.hadoop.dynamodb.DynamoDBClient.<init>(DynamoDBClient.java:83)
    at org.apache.hadoop.dynamodb.tools.DynamoDbExport.setTableProperties(DynamoDbExport.java:93)
    at org.apache.hadoop.dynamodb.tools.DynamoDbExport.run(DynamoDbExport.java:75)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.dynamodb.tools.DynamoDbExport.main(DynamoDbExport.java:30)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
Caused by: java.lang.ClassNotFoundException: com.amazon.ws.emr.core.InstanceInfo
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 13 more
31 May 2016 09:58:10,517 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrUtil: Collecting steps logs for cluster with AMI/ReleaseLabel 2.4.8
31 May 2016 09:58:10,518 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelperFactory: Getting the helper for version 1.0.3
31 May 2016 09:58:10,518 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Uploading step log details
31 May 2016 09:58:10,518 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: path to step logss3n://my-db.dev01.rule-logs/df-09387105FF7URCW5QOR/EmrClusterForBackup/@EmrClusterForBackup_2016-05-30T12:58:18/@EmrClusterForBackup_2016-05-30T12:58:18_Attempt=2/j-2TUYGWQ1PYAHC/steps
31 May 2016 09:58:10,518 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: step log file /mnt/taskRunner/output/logs/df-09387105FF7URCW5QOR/TableBackupActivity/@TableBackupActivity_2016-05-30T12:58:18/@TableBackupActivity_2016-05-30T12:58:18_Attempt=4/hadoop.jobs.log
31 May 2016 09:58:10,522 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Done uploading hadoop log details
31 May 2016 09:58:10,763 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Field value updated 
31 May 2016 09:58:10,763 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Done updating the field with value 
31 May 2016 09:58:10,767 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.taskrunner.HeartBeatService: Finished waiting for heartbeat thread @TableBackupActivity_2016-05-30T12:58:18_Attempt=4
31 May 2016 09:58:10,767 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.taskrunner.TaskPoller: Work EmrActivity took 1:0 to complete
2
Looks the EMR job is missing dependencies. Since data pipeline is a managed service you can't do anything. Reach out to aws support.Shibashis
What version of EMR do you use?mr0re1

2 Answers

0
votes

You may be using EMR 4.x. I suggest you try it with AMI 3.8.0. Let us know if you still run into issues.

0
votes

I've got question: do you run your pipeline from web console or there is a program? The reason I'm asking, please check all fields are filled correctly. It could be you've missed region and it cant find method signature with empty param where supposed to be a String (ex. eu-west-1).

From https://github.com/awslabs/emr-dynamodb-connector/blob/master/emr-dynamodb-tools/src/main/java/org/apache/hadoop/dynamodb/tools/DynamoDBExport.java you could chase your code flow. However keep in mind this class could be out of date so lines could be not matching. But it's giving you rough idea what happens there.