I'm having a hard time doing a basic scalar operation with PyOpenCl Basically, what i'me trying to do is, given a float type array, multiply each array element by a scalar float and put the result on a new buffer. This should be easy but for some reason it's not wroking as it should.
This is the code i'm using: (Variables with _h are HOST variables; Variables with _g are device variables)
import numpy as np
import pyopencl as cl
# Device Init
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
MF = cl.mem_flags
# Host Initial Variables
a_h = np.array([1.0, 2.0, 3.0, 4.0])
b_h = np.float32(2.0)
# DEVICE Variable Allocation
a_g = cl.Buffer(ctx, MF.READ_ONLY | MF.COPY_HOST_PTR, hostbuf=a_h)
c_g = cl.Buffer(ctx, MF.WRITE_ONLY, a_h.nbytes)
# DEVICE's Kernel - Multiply each element of the array a_g by the scalar b_g and put the result on the array c_g
source = """
__kernel void mult(float b_g, __global float *a_g, __global float *c_g){
const int gid = get_global_id(0);
c_g[gid] = b_g * a_g[gid];
}
"""
prg = cl.Program(ctx, source).build()
prg.mult(queue, a_h.shape, None, b_h, a_g, c_g)
# Export The Result On The DEVICE Back To The HOST
c_h = np.empty_like(a_h)
cl.enqueue_copy(queue, c_h, c_g)
# Output
print c_h
The expected ouput was
[2.0 4.0 6.0 8.0]
This was the output:
[ 2.56000000e+002 5.12000000e+002 -1.73777009e+308 -1.73777009e+308]
I don't understand why. I've tried reading the PyOpenCL project page but I didn't really understand much of it to be honest. I guess I'm not doing the kernel part correctly or the kernel call part.
I've tried using the kernel as this:
__kernel void mult(__global float *b_g, __global float *a_g, __global float *c_g)
But as expected it didn't work because i didn't create a pointer for b_g nor i know how to create one. The error was:
:2:39: error: parameter may not be qualified with an address space
__kernel void mult(__global float b_g, __global float *a_g, __global float *c_g){
^
My main idea behind this is simple: Since i'm going to use this value b_g as common thing to all the workers, I want to put them on the global memory once so that every worker can acess to it instead of repeating the value for every worker.
I believe this should be really simple but I'm new to parallel computing and have no idea how to fix this.
Thank you.