According to the CUDA Programming Guide , Page 122, it is possible to dynamically allocate memory inside a device/global function so long as we're using compute architecture 2.x.
My problem is that when I attempt this I get the command line message:
The command "some command" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" etc...
This is followed by an error saying that you cannot call a host function (malloc) from a device/global function.
The above message is showing that it is attempting to compile under compute 1.x. I am using VS2010 and have "Code Generation" set to "compute_20,sm_20" in the "CUDA C/C++" property page, so I am not sure why it is still trying to compile under compute 1.x. I am definitely using a card that supports 2.x. Any ideas?