I'm working with a system that's implemented in C++/OpenMP code, and it uses STL and Eigen's data structures all over the place. Algorithmically, the code seems like a great candidate for acceleration with the new Intel MIC (Xeon Phi) cards.
A typical parallel loop in the code looks like this:
#pragma omp parallel for private(i)
for (i = 0; i < n; ++i) {
computeIntensiveFunction(some_STL_or_eigen_container[i]);
The above pseudocode runs with reasonable performance, but it'd be great to offload some of it to the Xeon Phi card. Here's my attempt at doing this:
#pragma offload target (mic) // <---- NEW
#pragma omp parallel for private(i)
for (i = 0; i < n; ++i) {
computeIntensiveFunction(some_STL_or_eigen_container[i]);
However, the Intel ICC/ICPC compiler spits out an error like this: error: function "computeIntensiveFunction" called in offload region must have been declared with compatible "target" attribute.
It seems that complaints like this appear for functions and data structures that involve STL or Eigen.
Any thoughts on how to get around this?
I'm new to using Xeon Phi (recovering CUDA programmer), so I don't entirely understand the boundaries for "what can be offloaded?"
-offload-attribute-target=mic
might be part of the solution here. – solvingPuzzles