I am trying to learn OpenMP however the code run slower than not using openMP. There has been various posting about this but none seems to apply to my issues. I have created a simple program that illustrate the point for the use of 'omp parallel for' when running it I got the following performance.
No OMP 0.0109663sec
Parallel for: 0.0076869sec single thread
Parallel for: 0.0151231sec 2 threads
Parallel for: 0.0169528sec 4 threads
Parallel for: 0.0150955sec 8 threads
using 2 to 8 cores is roughly half the performance of not using openMP. Clearly this is not what I expected.
I am using visual studio express 2015. and it doesn't matter if you run it with optimizer on or off. I have set the /openmp in the c compiler command line. and I believe I have set the shared and private claused correctly. I am initializing an array with 1,000,000 entries, so any initial overhead to setting up the parallel threads should not be the issue. I have an Intel i7, with 8 cores
Code: I have two function testparrallelfor and testnoomp(). function Naming should be self-explanatory. the statement ++th[omp_get_thread_num()]; is just to count how many loop counts each thread is getting. The result is the same even if I comment that statement out. I have also tried to use a static variable double a[1000*1000] to see if the issue is with the dynamic heap allocation of variable a.
#include <omp.h>
static int th[8];
void reset_th()
{
int i;
for (i = 0; i < 8; ++i)
th[i] = -1;
}
void out_th()
{
int i;
cout << "Threads ";
for (i = 0; i < 8; ++i)
cout << i << ":" << th[i] + 1 << ", ";
cout << endl;
}
void testparallelfor(int len, int no)
{
const int n = 1000 * 1000;
double tw;
double *a = new double[n];
reset_th();
tw = omp_get_wtime();
#pragma omp parallel shared(a, len, th) num_threads(no) if (len > 1000)
{
#pragma omp for
for (int la = 0; la < len; ++la)
{
++th[omp_get_thread_num()];
a[la] = la * 2 + 1;
}
}
tw = omp_get_wtime() - tw;
cout << "Parallel for " << tw << "sec" << endl;
out_th();
}
void testnoomp(int len)
{
int n = 1000 * 1000;
double tw;
double *a = new double[n];
reset_th();
tw = omp_get_wtime();
for (int la = 0; la < len; ++la)
{
++th[omp_get_thread_num()];
a[la] = la * 2 + 1;
}
tw = omp_get_wtime() - tw;
cout << "No OMP " << tw << "sec" << endl;
out_th();
}
int main()
{
int n = 1000*1000;
testnoomp(n); // no OpenMP
for(int i=1; i<=8; i*=2)
testparallelfor(n, i); // is is the number of threads to be sued
cout << endl;
return 0;
}
Any help or insight would be appreciated.