This is a follow up to Why is my OpenMP implementation slower than a single threaded implementation? .
I have adhered to the answer provided, and used tasking instead of for pragmas to speed up the code. However, compared to a sequential (same) program, both programs run equally as fast. I witness no speed up.
The reworked code is here: http://pastebin.com/3SFaNEc4
I simply removed all the for pragmas and replaced it tasking pragmas for the recursive procedures.
Am I doing anything wrong? I should be seeing an almost linear speed up. What do you guys think?
Thanks!