Currently, trying to run a quantitative data processing pipeline utilizing RDS and EC2 instances on AWS. There is one portion of the pipeline that requires significant computing power but is not mission or time critical and therefore I would like to use a cluster of EC2 spot instances at that point.
I have been considering using the AWS Data Pipeline product in order to architect the pipeline. However, I am unsure on how to integrate the spot instances. AWS documentation suggests that spot instances can be utilized in an AWS EMR cluster using the Data Pipeline, but not outside of them. Looking for suggestions or best practices.