Given an example function (example is given below), the for loop can either be parallelized using OpenMP or be vectorized using vectorization (assuming that compiler does the vectorization).
Example
void function(float* a, float* b, float* c, int n)
{
for(int i = 0; i < n; i++)
{
c[i] = a[i] * b[i];
}
}
I would like to know
- Whether there will be any difference in performance between OpenMP and Vectorization
- Is there any advantage in using one over the other.
- Is there any possibility of using both OpenMP and vectorization together.
Note: I didn't give a though about the different SSE versions, number of processors/cores (as number of threads scales up in OpenMP), etc... My question is in general. The answers can also be more specific as well.