I export my DynamoDB tables to s3 as a means of backup (via EMR). When I export, I store the data as lzo compressed file. My hive query is below, but essentially I followed the "To export an Amazon DynamoDB table to an Amazon S3 bucket using data compression" at http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/EMR_Hive_Commands.html
I now want to do the reverse - take my LZO file(s) and get them back into a hive table. How do you do this? I was expecting to see some hive configuration property for input, but there is not. I've googled and found some hints, but nothing definitive and nothing that works.
Files in s3 are in the format: s3://[mybucket]/backup/year=2012/month=08/day=01/000000.lzo
Here is my HQL that does the export:
SET dynamodb.throughput.read.percent=1.0;
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzopCodec;
CREATE EXTERNAL TABLE hiveSBackup (id bigint, periodStart string, allotted bigint, remaining bigint, created string, seconds bigint, served bigint, modified string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "${DYNAMOTABLENAME}",
"dynamodb.column.mapping" = "id:id,periodStart:periodStart,allotted:allotted,remaining:remaining,created:created,seconds:seconds,served:served,modified:modified");
CREATE EXTERNAL TABLE s3_export (id bigint, periodStart string, allotted bigint, remaining bigint, created string, seconds bigint, served bigint, modified string)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://<mybucket>/backup';
INSERT OVERWRITE TABLE s3_export
PARTITION (year="${PARTITIONYEAR}", month="${PARTITIONMONTH}", day="${PARTITIONDAY}")
SELECT * from hiveSBackup;
Any ideas how to get it from s3, decompress, and into hive table??