8
votes

I am new to DynamoDB, and i wonder if there will be any difference in generating a report out of this key/value pair data store than from a DBMS.

My (Java) application writes data into DynamoDB, and i am hoping to generate business reports (e.g. sales report) out of it.

What i understand is Amazon provides EMR (elastic Map Reduce), further reading is that it has Hive underneath which would allow me to use SQL like syntax to query DynamoDB.

Should my data be less than 50GB, is using EMR an overkill for this task?

1

1 Answers

7
votes

Yes hive uses SQL like syntax. Hive is still written in java and under the hood it is still java. Hive wiki is a good place to start.Here is a good article about using Dynamo DB with EMR http://aws.amazon.com/articles/28549

Should my data be less than 50GB, is using EMR an overkill for this task?

I dont think so, once you have EMR setup and have exported the dynamo table to s3 or an internal hadoop table. You can then query the S3 or the internal hadoop table without affecting DynamoDB's provisional throughput capacity. Since S3 is very fast you can write all sort of complex hive queries to get the reports you want.

The command line tool to start up EMR is very easy to setup and if you want to save money you can always bid for spot instances.

Also when the Job is running slow you can increase the core and the task nodes to get the job running quickly if you want to.