0
votes

I'm trying to implement cross-region copying from us-east-1 to us-west-1.

I used the cross-region copying template in Amazon Data Pipeline to copy a table every couple hours however I can't get incremental copying working. I have to fill in the Filter SQL field under Activities under DefaultHiveCopyActivity1. The table contains a column called timestamp that is a string with the format yyyy-MM-dd HH:mm:ss.SSS which contains the time of request.

I've been trying to convert the string into a unix_timestamp and subtracting 7200 from the current value of unix_timestamp (as the pipeline activity is supposed to run every two hours hence find every record in the table with the timestamp of 2 hours ago onward). I use this Filter SQL field value: unix_timestamp(timestamp, "yyyy-MM-dd HH:mm:ss.SSS") > unix_timestamp() - 7200 The pipeline fails.

I've even tried checking for a simple request such as putting the following into the Filter SQL field (it adds the backslashes on its own): timestamp = \"2014-04-30 22:55:41.301\" - according to the documentation I should be entering: a Hive SQL statement fragment that filters a subset of DynamoDB or Amazon S3 data to copy. The filter should only contain predicates and not begin with a WHERE clause, because AWS Data Pipeline adds it automatically. This pipeline fails as well even though there is a record with that exact value in the table. A full table copy from region to region with no SQL filter works fine though.

This is the error I'm getting: FAILED: Parse Error: line 2:131 cannot recognize input near 'timestamp' ',' '"yyyy-MM-dd HH:mm:ss.SSS"' in function specification

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template-crossregionddbcopy.html http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-hivecopyactivity.html

1

1 Answers

0
votes

The timestamp field in the HiveQL query should be backticked like this as timestamp appears to be a keyword in Hive while the timestamp column name is being referred to: unix_timestamp(`timestamp`, "yyyy-MM-dd HH:mm:ss.SSS") > unix_timestamp() - 7200