I'm trying to find out whether I could use an Intel Xeon Phi coprocessor to "parallelize" the following problem:
Say I have 2000 files that need to be processed by a single-threaded executable. For each file, the executable reads it, does its thing and outputs it to a correspoinding output file, then exits.
For instance:
FILES=/path/to/*
for f in $FILES
do
# take action on each file
./executable $f outFileCorrespondingTo_f
done
The tools are not coded for multi-threaded execution, or looping through the files, nor do we wish to change anything in their code for now. They're written in C with some external libraries.
My questions are:
Could this kind of "script-looping" be run on the Xeon Phi's native OS in such a way that it parallelizes the calls to the executable, so they run concurrently on all of its cores? Is it "general-purpose" enough for that?
The files themselves are rather small, so its 8GB memory would be more than enough for storing the data at runtime, but not for keeping all of the output on the device, so I would need to output on the host. So my second quetion is: is this kind of memory exchange possible "externally"?
i.e. not coded into the tool, but managed by the host OS and the device, for every execution of the executable.
- If this is possible, could it provide a performance boost in any way, or would the memory and thread allocation bottlenecks be too intensive? Basically each execution takes a few seconds, depending on the length of the input file, but I'm pretty confident this is a few orders of magnitude longer than how much it would take to transfer the file.