Intel Xeon Phi - running multiple single-threaded executables

Question

I'm trying to find out whether I could use an Intel Xeon Phi coprocessor to "parallelize" the following problem:

Say I have 2000 files that need to be processed by a single-threaded executable. For each file, the executable reads it, does its thing and outputs it to a correspoinding output file, then exits.

For instance:

FILES=/path/to/*
for f in $FILES
do
    # take action on each file
    ./executable $f outFileCorrespondingTo_f
done

The tools are not coded for multi-threaded execution, or looping through the files, nor do we wish to change anything in their code for now. They're written in C with some external libraries.

My questions are:

Could this kind of "script-looping" be run on the Xeon Phi's native OS in such a way that it parallelizes the calls to the executable, so they run concurrently on all of its cores? Is it "general-purpose" enough for that?
The files themselves are rather small, so its 8GB memory would be more than enough for storing the data at runtime, but not for keeping all of the output on the device, so I would need to output on the host. So my second quetion is: is this kind of memory exchange possible "externally"?

i.e. not coded into the tool, but managed by the host OS and the device, for every execution of the executable.

If this is possible, could it provide a performance boost in any way, or would the memory and thread allocation bottlenecks be too intensive? Basically each execution takes a few seconds, depending on the length of the input file, but I'm pretty confident this is a few orders of magnitude longer than how much it would take to transfer the file.

Regarding performance running multiple processes concurrently is likely to cause a lot of L2 cache contention/thrashing. Good L2 usage is usually very important for getting good performance on KNC. It depends on the workload though so YMMV. — amckinley

Gilles Gilles · Accepted Answer · 2015-10-01T07:45:24

Xeon phi co-processors run a very feature-full version of the Linux operating system, so most of what you are used to on a Linux box is likely to work on Xeon Phi as well.

Now, for your specific issue, I guess that GNU Parallel should just permit you to do what you want in a breath. Simply, you'll have to have your file system mounted on the card so that you can access the files directly, but this is just standard stuff for a Xeon Phi node. And be aware that this will generate some traffic on the PCIe link between the host and the co-processor for the file transfers.

Regarding performance, this is hard to tell: the lower single-threaded performance of Xeon Phi cores along with the transfer times are definitely suggesting a big hit in this domain, but the level of parallelism you can extract from the device might very well overcome this, depending on how compute intensive your workload is. Best answer is for you to give it a try...

Intel Xeon Phi - running multiple single-threaded executables

2 Answers