Have GPU code working fine as device functions called with funcname <<< >>>.
Changed code to work with PTX files.
Now all the lines with calls to cudaMemcpyToSymbol all return error code: invalid symbol
here is snippet from the .cu file:
{
__device__ __constant__ void *devInFramePtrs [20];
__device__ __constant__ void *devOutFramePtrs [20];
__device__ __constant__ void *devProcFramePtrs [60];
__device__ __constant__ void *devProcOutFramePtrs [60];
__device__ __constant__ AlgorithmParms *devAlgoParmsPtr;
__device__ __constant__ AlgorithmStats *devStatParmsPtr;
__device__ float diamondOffsetsGlobal[36];
}
================= in the ptx file i get:
.global .align 4 .b8 devInFramePtrs[80];
.global .align 4 .b8 devOutFramePtrs[80];
.global .align 4 .b8 devProcFramePtrs[240];
.global .align 4 .b8 devProcOutFramePtrs[240];
.global .align 4 .u32 devAlgoParmsPtr;
.global .align 4 .u32 devStatParmsPtr;
.global .align 4 .b8 diamondOffsetsGlobal[144];
================= then the host code is:
err = cudaMemcpyToSymbol("devInFramePtrs", gDevInFramePtrs, sizeof(void *) * 20, 0, cudaMemcpyHostToDevice);
err = cudaMemcpyToSymbol("devOutFramePtrs", gDevOutFramePtrs, sizeof(void *) * 20, 0, cudaMemcpyHostToDevice);
err = cudaMemcpyToSymbol("devProcFramePtrs", gDevProcFramePtrs, sizeof(FRAME_BASE_TYPE *) * numDevInProcFramePtrs3, 0, cudaMemcpyHostToDevice);
err = cudaMemcpyToSymbol("devProcOutFramePtrs", gDevProcOutFramePtrs, sizeof(FRAME_BASE_TYPE *) * numDevOutProcFramePtrs3, 0, cudaMemcpyHostToDevice);
err = cudaMemcpyToSymbol("diamondOffsetsGlobal", &(diamondOffset[0][0]), sizeof(float) * 36, 0, cudaMemcpyHostToDevice);
========================
all the calls return value 11: invalid symbol
details: Cuda 4.2, running under VS2010, Win7 32 bit application.
=========================
here is the compile script:
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2010
-ccbin "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin"
-I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include"
-G --keep-dir "Debug" -maxrregcount=32 --machine 32 -ptx -o "U:\filterKernel.ptx" "U:\filterKernel.cu"
As I said, only change was to make the PTX file and change the function calls. Also note that i get the error whether the variables are in const store or regular global store.
thanks in advance.
cudaMemcpyToSymbol
after loading the PTX withcuModuleLoadData[Ex]
? Also note that using strings to reference symbols (you need to pass the symbol itself) is deprecated after CUDA 4.1, so you probably shouldn't be doing it this way anyway. In your case, I believe you need to usecuModuleGetGlobal
to get a device pointer that you can copy to. – harrism