I have a batch of python jobs, that only differ in the input file they are reading, say:
python main.py --input=file1.json > log_file1.txt
python main.py --input=file2.json > log_file2.txt
python main.py --input=file3.json > log_file3.txt
...
All these jobs are independent, and use a prebuilt anaconda environment.
I'm able to run my code on an on-demand EC2 instance using the following workflow:
- Mount an EBS volume with the input files and prebuilt conda environment.
- Activate the conda environment.
- Run python programs, such that each program reads a different input file, and writes to a separate log file. The input files are stored in the EBS volume, and the log files will be written to the EBS volume.
Now, I want to scale this to use AWS spot instances -- basically, if I have N jobs, request N spot instances that run one of the above jobs each to read different files from an existing volume, and write the outputs to different files on the same volume. But I couldn't find a comprehensive guide on how to go about it. Any help would be appreciated.