I'm developing a parallel algorithm on a Intel i5-core machine, which has two cores, four threads.
n defines the size of the matrix, on which I perform my calculations on. As you can see from the table below there is almost 50% reduction from 1 thread to 2 threads utilization, but almost no difference between 2 threads and 4 threads. The numbers denote the seconds passed
My compiler is mingw-gcc on windows platform. My parallelization tool is openmp. I'm defining number of threads by omp_set_num_threads(numThreads);
in the beginning of the parallel routine.
I have no means to test the algorithm on a "real" 8 core machine. On my i5 machine, At 1 thread, task manager shows 25% of the total cpu power is used. At 2 threads, it's 50%, and at 4 threads, it's 96-99% as expected.
So what might be the reason for that situation? Why doesn't the computation time get halved?
The parallel code segment is to be found below:
#pragma omp parallel for schedule(guided) shared(L,A) \
private(i)
for (i=k+1;i<row;i++){
double dummy = 0;
for (int nn=0;nn<k;nn++){
dummy += L[i][nn]*L[k][nn];
L[i][k] = (A[i][k] - dummy)/L[k][k];
}
}