0
votes

I have a massive 2 dimensional array that I am doing calculations on (4.3bil(32bit unsigned int) in the X by 512 rows in the Y) I'm trying to speed up with openCL.

I obviously cannot do this all at once, so I'm trying a 2 dimensional work group setup.

Each "column", 4.3billion of them, has the next row generated by some math and bit rotating of the previous row. So I can generate all the values on the fly.

But since the row above the current working row must be set with a value already, I need to ensure that when the work groups fire off, the group "above" has already been completed. The groups next to each other don't matter - columns don't interact with other columns.

How are work groups fired off? Across the X, then reset X to 0 and increment Y by one? Random jumping about? Is there any rhyme or reason to it?

And can work groups be told to hold off until the group 'above' it has finished?

I know all my next to, above, below, rows, columns is meaningless in a 2 dimensional array programming sense, but it helps me at least visualize the problem.

I can make it work if I can ensure groups don't jump ahead. Just not sure how.

I know there are fences to be sure other work items have caught up

1

1 Answers

0
votes

There is no way to synchronise between work-groups within a kernel invocation in OpenCL. If you need global memory synchronisation across multiple work-groups, your need to enqueue multiple kernels.

If each column is an independent piece of work, it sounds like it would be simpler to launch one work-item per column, and have it do every row for that column. Something like the following:

kernel void foo(...)
{
  int column = get_global_id(0);
  for (int row = 0; row < 512; row++)
  {
    // Do processing for (column,row), based on (column,row-1),
    // which was also processed by this work-item
  }
}

This would just be launched as 1D kernel, with global size equal to the number of columns.