I have a massive 2 dimensional array that I am doing calculations on (4.3bil(32bit unsigned int) in the X by 512 rows in the Y) I'm trying to speed up with openCL.
I obviously cannot do this all at once, so I'm trying a 2 dimensional work group setup.
Each "column", 4.3billion of them, has the next row generated by some math and bit rotating of the previous row. So I can generate all the values on the fly.
But since the row above the current working row must be set with a value already, I need to ensure that when the work groups fire off, the group "above" has already been completed. The groups next to each other don't matter - columns don't interact with other columns.
How are work groups fired off? Across the X, then reset X to 0 and increment Y by one? Random jumping about? Is there any rhyme or reason to it?
And can work groups be told to hold off until the group 'above' it has finished?
I know all my next to, above, below, rows, columns is meaningless in a 2 dimensional array programming sense, but it helps me at least visualize the problem.
I can make it work if I can ensure groups don't jump ahead. Just not sure how.
I know there are fences to be sure other work items have caught up