I'm having some trouble running cudaMemcpy to get some data back from my GPU. cudaErrorString is "invalid argument" and it happens on the memcpy from device to host. Here is my isolated code:
//To render particles out of.
GLfloat* particleRenderData = new GLfloat[particleContainer.size() * 4];
//particlePosBuffer lives on GPU and is used to copy updated particle data
//Back to the OpenGL Buffer.
GLfloat *particlePosBuffer;
cudaStatus = cudaMalloc((void**)&particlePosBuffer, particleContainer.size() * sizeof(GLfloat)* 4);
CUDA_CHECK_STATUS;
//CalcBuffer is our points. CUDA will modify it on GPU.
Point3D *calcBuffer;
cudaStatus = cudaMalloc((void**)&calcBuffer, particleContainer.size() * sizeof(Point3D));
CUDA_CHECK_STATUS;
cudaStatus = cudaMemcpy(calcBuffer, &particleContainer[0], particleContainer.size() * sizeof(Point3D), cudaMemcpyHostToDevice);
CUDA_CHECK_STATUS;
update << <1, 1 >> > (calcBuffer, particlePosBuffer, particleContainer.size(), 1.0);
cudaThreadSynchronize();
cudaStatus = cudaMemcpy(particlePosBuffer, particleRenderData, particleContainer.size() * sizeof(GLfloat)* 4, cudaMemcpyDeviceToHost);
CUDA_CHECK_STATUS;
particleContainer is a vector of type Point3D, which is a class I wrote. The first memcpy to the device is successful, I have compared host and device buffers to be sure of this. As of right now, update is likely not the issue. This problem occurs with or without it. Likewise with the synchronize. I've tried many different things, including casting particlePosBuffer and renderData to void*, passing just the reference, and both.
I'm using CUDA 6.5 inside Visual Studio 2013. GPU is a gtx 770 and I'm compiling compute_30, sm_30.
I'm hoping someone can help me on this, I'm very stuck against the wall here.