I am using a #pragma omp barrier to ensure that all my parallel threads meet up at the same point before continuing (no fancy conditionally branching code, just straight loop), but I am surmising that the barrier pragma does not actually guarantee synchronicity, just completion as these are the results I am getting:
0: func() size: 64 Time: 0.000414 Start: 1522116688.801262 End: 1522116688.801676
1: func() size: 64 Time: 0.000828 Start: 1522116688.801263 End: 1522116688.802091
thread 0 is starting about a microsecond faster than thread 1, giving it the somewhat unrealistic completion time of 0.414 msec, incidentally in a single core/thread run the run time averages around 0.800 msec. (please forgive me if my units are off, it is late).
My Question is: Is there a way to ensure in openMP that threads are all started at the same time? Or would I have to bring in another library like pthread in order to have this functionality?
pause
/cmp byte [flag], 0
/je .retry
busy-wait loop. (Write that loop in C with atomics and_mm_pause()
if you want). – Peter Cordes