Environment: Ubuntu 18.04, OneAPI beta 6
Full code is below, but here's the offending error:
#dpcpp -O2 -g -o so2 so2.cpp -lOpenCL -lsycl
so2.cpp:64:38: error: cannot cast from type 'global_ptr<int>' (aka 'multi_ptr<int, access::address_space::global_space>') to pointer type 'int (*)[nelem]'
int (*xptr)[nelem] = (int (*)[nelem])xaccessor.get_pointer();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
A bit of explanation in case you're wondering why....
When developing data parallel code I often develop what Intel used to call "elemental functions". These are written to operate on a single element of the application (what SYCL would call a work item). I've always found this an easier do with a basic SW development environment, easy to test, and more generally reuseable (scalar, SIMD, CUDA, etc...).
After getting things tested on a single element, moving to data parallel is pretty easy by expanding the calling code without having to rewrite/retest the functions:
int x[NELEM]
fn1(x, NELEM)
becomes
int x[NPROC][NELEM]
for (int p=0; p<NPROC; p++) fn1(x[p], NELEM);
In a SYCL kernel, fn1(x[item.get_linear_id()], NELEM); would be all I need without having to rewite the function(s) to understand ids and/or accessors.
The SYCL issue with the above code is that in the kernal C++ I can not seem recast the accessor pointer to a 2D pointer. This is allowable in the application C++ (see code above).
Maybe this is a bad way to right code, but it makes it easy to develop/test code that works for scalar and data parallel codeand keeps libraries somewhat portable. It's also provides a way around the SYCL 3 dimension limit on buffers/accessors.
Anyway, I'm curious as to what a real SYCL programmer would think.
Full code for toy example:
#include <CL/sycl.hpp>
#include <cstdio>
namespace sycl = cl::sycl;
const int Nproc=3;
const int Nelem=4;
/** elemental function **/
void
fn1(int *h, int n)
{
for (int i=0; i<n; i++) h[i] = 10*h[i]+2*i;
}
int
main(int argc, char *argv[])
{
/** Make some memory **/
int x1d[Nproc * Nelem];
for (int j=0; j<Nproc; j++) {
for (int i=0; i<Nelem; i++) x1d[j*Nelem+i] = 10*j+i;
}
printf("1D\n");
for (int i=0; i<Nelem; i++) {
printf("%d : ", i);
for (int j=0; j<Nproc; j++) printf("%d ", x1d[j*Nelem+i]);
printf("\n");
}
/** Reshape it into 2D **/
int (*x2d)[Nelem] = (int (*)[Nelem])x1d;
for (int j=0; j<Nproc; j++) fn1(x2d[j], Nelem);
printf("2D\n");
for (int i=0; i<Nelem; i++) {
printf("%d : ", i);
for (int j=0; j<Nproc; j++) printf("%d ", x2d[j][i]);
printf("\n");
}
/** SYCL setup **/
sycl::device dev = sycl::default_selector().select_device();
std::cout << "Device: "
<< "name: " << dev.get_info<sycl::info::device::name>() << std::endl
<< "vendor: " << dev.get_info<sycl::info::device::vendor>() << std::endl;
sycl::queue q(dev);
{
sycl::buffer<int, 1> xbuffer(x1d, sycl::range<1> {Nproc*Nelem});
q.submit([&](sycl::handler& cgh) {
int nelem = Nelem;
auto xaccessor = xbuffer.get_access<sycl::access::mode::read_write, sycl::access::target::global_buffer>(cgh);
cgh.parallel_for<class k0>(
sycl::range<1> {Nproc},
[=] (sycl::item<1> item) {
int idx = item.get_linear_id();
#if 0
int *xptr = (int *)xaccessor.get_pointer(); // doing this does work so we _can_ get a real pointer
fn1(xptr + nelem*idx, nelem);
#else
int (*xptr)[nelem] = (int (*)[nelem])xaccessor.get_pointer();
//int *ptr = (int *)xaccessor.get_pointer(); // splitting it into two doesn't work either
//int (*xptr)[nelem] = (int (*)[nelem])ptr;
fn1(xptr[idx], nelem);
#endif
}
);
}
);
}
printf("2D SYCL\n");
for (int i=0; i<Nelem; i++) {
printf("%d : ", i);
for (int j=0; j<Nproc; j++) printf("%d ", x1d[j*Nelem+i]);
printf("\n");
}
}
Edit 1:
Per illuhad's comment I tried to flesh out some alternatives.
First the two commented lines seem like they should do what he suggests:
int *ptr = (int *)xaccessor.get_pointer();
int (*xptr)[nelem] = (int (*)[nelem])ptr;
but in fact it yields this error:
error: cannot initialize a variable of type 'int (*)[nelem]' with an rvalue of type 'int (*)[nelem]'
int (*xptr)[nelem] = (int (*)[nelem])ptr;
^ ~~~~~~~~~~~~~~~~~~~
adding a "get()" to the end of the get_pointer yields the same.
Curiously, addressing the "initialize" part of the error:
int *ptr = (int *)xaccessor.get_pointer().get();
int (*xptr)[nelem];
xptr = (int (*)[nelem])ptr;
Yields the amusing error:
error: incompatible pointer types assigning to 'int (*)[nelem]' from 'int (*)[nelem]'
xptr = (int (*)[nelem])ptr;
^~~~~~~~~~~~~~~~~~~
So if/when someone gets the time, I'm still curious...
get_pointer()
returns aglobal_ptr<T>
, not a pointer.global_ptr<T>
defines a conversion operator toT*
, such that it can be implicitly converted toT*
. This is why the 1D case works. For 2D, you would first need to explicitly convert theglobal_ptr
toT*
using theget()
member function. You can then cast this pointer to whatever you like. – illuhad