How to write the cuda kernel for convolutions?

Question

I am totally new in cuda and I would like to write a cuda kernel that calculates a convolution given an input matrix, convolution (or filter) and an output matrix.

Note: I want each thread of the cuda kernel to calculate one value in the output matrix.

How can I do this?

As far as I remember there were dozens of examples on the CUDA website. Especially given the fact that convolution is a very common task. Has this changed or haven't you found anything there? — CWBudde
@CWBudde thank you for your comment. Yes i found couple of long example with many hard cases all over the websites, but I haven't find straightforward one yet unfortunately. I will be more than happy if you have any. — Bilgin

Tom Tom · Accepted Answer · 2017-11-20T02:43:32

If the filters cover fill range of the matrix, then it can be directly converted to cublasSgemm.

For example, suppose the dimensions of the matrix is 5 * 4, and you need 130 filters, then the filters matrix to be trained is of dimensions 130 * 20, and the 5 * 4 matrix can be taken as 20 * 1.

In this way, the computation speed is optimal; it's converted to matrix multiplication between m1 (130, 20) and m2 (20, 1).

How to write the cuda kernel for convolutions?

3 Answers