I have DynamoDB tables that are backed up into S3, and need a (fairly minimal) way to do bespoke queries on our data in a way that Dynamo doesn't support. Athena seems like a nice way to use our S3 backups as the source for our proto DW/BI tool.
The issue with this is that DyanmoDB backup to S3 stores each row in this format
{"Column1":{"n":"1234"},"Column2":{"n":"5678"},"Column3":{"s":"abcd"}}
This makes it difficult for Athena / Presto to read it. I can import the data using
CREATE EXTERNAL TABLE test_table (
column1 struct<n:string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
But this is not ideal, as the 'number' is still considered a string. I have considered downloading the file, flattening the JSON and re-uploading it to a different place, BUT this is not valid JSON either. Only each row of the file is valid JSON.
Is there a way for Athena to read the data in the file so it imports the field types correctly? Or ideally is there a way that Athena can flatten the {"n":"1234"} into an int?