I'm trying to get started with pyOpenCL and GPGPU in general.
For the below dot product code I'm getting fairly different results between the GPU and CPU versions. What am I doing wrong?
The difference of ~0.5% seems large for floating point errors to account for the difference. The difference does seem to increase with array size (~9e-8 relative difference with array size of 10000). Maybe it's an issue with combining results across blocks...? Either way, color me disconcerted.
I don't know if it matters: I'm running this on a MacBook Air, Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz, with Intel HD Graphics 5000.
Thanks in advance.
import pyopencl as cl
import numpy
from pyopencl.reduction import ReductionKernel
import pyopencl.clrandom as cl_rand
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
dot = ReductionKernel( ctx, \
dtype_out = numpy.float32, \
neutral = "0", \
map_expr = "x[i]*y[i]", \
reduce_expr = "a+b", \
arguments = "__global const float *x, __global const float *y"
)
x = cl_rand.rand(queue, 100000000, dtype = numpy.float32)
y = cl_rand.rand(queue, 100000000, dtype = numpy.float32)
x_dot_y = dot(x,y).get() # GPU: array(25001304.0, dtype=float32)
x_dot_y_cpu = numpy.dot(x.get(), y.get()) # CPU: 24875690.0
print abs(x_dot_y_cpu - x_dot_y)/x_dot_y # 0.0050496689740063489