I am writing a parallel merge sort program. I use fork() to perform the parallel processing. I tried running 2 parallel processes, 4 processes, 8 processes and so on. Then I found that the one running with 2 processes required the least time to finish, i.e the highest performance. I think it's reasonable as my cpu is core 2 duo. For 4,8,16,32 processes, it seems to have a steady declining of performance, but after that the performance fluctuate (doesn't seem to have a pattern). Can someone explain that?
Plus according to the pattern, I have a feeling that when the number of processes used in the program is equal to the number of core that my cpu has, the my program could have the highest performance. But I am 100% sure. Can someone verify me? Or tell me what actually affect the performance of a parallel program.
Thanks in advance!!