Is there a problem with the HIVE script below or is this another issue, possibly related to the version of HIVE installed by AWS Data Pipeline?
The first part of my AWS Data Pipeline must export large tables from DynamoDB to S3 to later process using EMR. The DynamoDB table that I'm using for testing is only a few rows long, so I know that the data is formatted correctly.
The script associated with the AWS Data Pipeline "Export DynamoDB to S3" building block works correctly for tables that contain only primitive_types
but don't export array_type
s. (reference - http://archive.cloudera.com/cdh/3/hive/language_manual/data-manipulation-statements.html)
I pulled out all Data Pipeline-specific stuff and am now trying to get the following minimal example based on the DynamoDB docs to work - (reference - http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMR_Hive_Commands.html)
-- Drop table
DROP table dynamodb_table;
--http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMR_Hive_Commands.html
CREATE EXTERNAL TABLE dynamodb_table (song string, artist string, id string, genres array<string>)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "InputDB",
"dynamodb.column.mapping" = "song:song,artist:artist,id:id,genres:genres");
INSERT OVERWRITE DIRECTORY 's3://umami-dev/output/colmap/' SELECT *
FROM dynamodb_table;
Here is the stack-trace / EMR errors that I'm see when running the above script -
Diagnostic Messages for this Task:
java.io.IOException: IO error in map input file hdfs://172.31.40.150:9000/mnt/hive_0110/warehouse/dynamodb_table
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:244)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:218)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.lang.NullPointerException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:238)
... 9 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.dynamodb.read.AbstractDynamoDBRecordReader.scan(AbstractDynamoDBRecordReader.java:176)
at org.apache.hadoop.hive.dynamodb.read.HiveDynamoDBRecordReader.fetchItems(HiveDynamoDBRecordReader.java:87)
at org.apache.hadoop.hive.dynamodb.read.HiveDynamoDBRecordReader.next(HiveDynamoDBRecordReader.java:44)
at org.apache.hadoop.hive.dynamodb.read.HiveDynamoDBRecordReader.next(HiveDynamoDBRecordReader.java:25)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
... 13 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Command exiting with ret '255'
I tried a few things already to debug, but none of them have been successful - creating an external table w/formatting, using a few different JSON SerDes. I'm not sure what to try next.
Many thanks.
2The CureInbetween Days\N\n 1The CureBoys Dont Cry\N\n 3Depeche ModePimpf\N\n
The "\N" is actually in the output and the "\n"s are real line-breaks. β Elliot