I would like to use AWS Data Pipeline to execute an ETL process. Suppose that my process has a small input file and I am would like to use a custom jar or python script to make data transformations. I dont see any reason to use a cluster EMR to make this simple data step. So, I would like to execute this data step in a EC2 single instance.
Looking at the AWS DataPipeline at EMRActivity object, i just see the option to run using an EMR cluster. Is there way to run a computation step inside a EC2 instance? Is it th best solution for this use case? Or Is it better to setup a small EMR (with a single node) and execute a hadoop job?