I test the convn on two GPU: Quadro 6000 and Titan both take longer time than cpu.
A quick test can be done in matlab:
maxloop=1000;
for i=1:maxloop
output2= convn(rand(320,1), rand([6,1,300]),'full');
end
for i=1:maxloop
goutput2= convn(gpuArray.rand(320,1),gpuArray.rand([6,1,300]), 'full');
end
It takes, 0.52s on CPU, but 7s on Quadro 6000 and 15s+- on Titan.
What I had tested:
1) If change the rand input to fixed, predefined values does not give any improvement.
2) Predefine GPU output(goutput2) doesn't help so much.
Quadro

Titan

I do run the same test as the first answer:
Same result obtained when m=1000; n=100; k=5;
Elapsed time is 2.367453 seconds. %%%%GPU
Elapsed time is 27.502952 seconds. %%%%CPU
My question is what and why my own test code is running slower on GPU?
