Good day everyone!
I'm conducting a molecular dynamics simulation, and recently I began to try to implement it in parallel. At first sight everything looked simple enough: write #pragma omp parallel for directive in front of the most time consuming loops. But as it happens, functions in those loops operate on arrays, or, to be precise, on arrays which belong to an object of my class that contains all information about particle system and functions acing on this system, so that when I added that #pragma directive before one of the most time consuming loops, the computation time actually increased several times despite the fact that my 2 core 4 thread processor was fully loaded.
In order to sort this out I wrote another, simpler program. This test program performs two identical loops, one in parallel, and the second one - in serial. The time it takes to execute both of these loops is measured. The results surprised me: whenever the first loop was computed in parallel, its computation time decreased in comparison with serial mode (1500 and 6000 ms respectively), but the computation time of the second loop increased drastically (15 000 against 6000 in serial).
I tried to use private() and firstprivate() clauses, but the results were the same. Shouldn't every variable defined and initialized before parallel region be shared automatically anyway? The computation time of the second loop gets back to normal if performed on another vector: vec2, but creating a new vector for every iteration is, clearly, not an option. I've also tried to put actual update of vec1 into #pragma omp critical area, but that wasn't any good either. Neither helped adding Shared(vec1) clause.
I would appreciate if you could point out my errors and show the proper way.
Is it necessary to put that private(i) into the code?
Here is this test program:
#include "stdafx.h"
#include <omp.h>
#include <array>
#include <time.h>
#include <vector>
#include <iostream>
#include <Windows.h>
using namespace std;
#define N1 1000
#define N2 4000
#define dim 1000
int main(){
vector<int>res1,res2;
vector<double>vec1(dim),vec2(N1);
clock_t t, tt;
int k=0;
for( k = 0; k<dim; k++){
vec1[k]=1;
}
t = clock();
#pragma omp parallel
{
double temp;
int i,j,k;
#pragma omp for private(i)
for( i = 0; i<N1; i++){
for(j = 0; j<N2; j++){
for( k = 0; k<dim; k++){
temp+= j;
}
}
vec1[i]+=temp;
temp = 0;
}
}
tt = clock();
cout<<tt-t<<endl;
for(int k = 0; k<dim; k++){
vec1[k]=1;
}
t = clock();
for(int g = 0; g<N1; g++){
for(int h = 0; h<N2; h++){
for(int y = 0; y<dim; y++){
vec1[g]+=h;
}
}
}
tt = clock();
cout<<tt-t<<endl;
getchar();
}
Thank you for your time!
P.S. I use visual studio 2012, My processor is Intel Core i3-2370M. My assembly file in two parts:
i
private, as the loop variable you're#pragma omp for
ing is made private automatically, and at any rate is defined in the parallel section so will be private to each thread -- that is also true ofj
,k
,andtemp
. Note though that yourtemp
is undefined when you start adding to it; you should initialize it to zero. As to the other question - can you confirm what you mean is that the serial portion of the program runs more slowly if you run the parallel portion with multiple threads? If so, that's probably a memory-affinity thing. – Jonathan DursiN1
,N2
anddim
? What compiler do you use? (I would assume Visual Studio or some other Windows IDE given thatgetchar()
at the end of your program...) What kind of CPU do you have (we already know that it is dual-core but what exactly)? – Hristo Iliev.asm
ending. Then paste their content to pastebin and add the URLs to your question. Do that with OpenMP enabled and with OpenMP disabled (please mark clearly which file is from which case in the pastes). May be we should continue this discussion in the chat. – Hristo Iliev