How to multiply 2 OpenCV mats using a GPU

Question

In OpenCV, I can multiply an RGB 1920 x 1080 mat by a 3 x 3 Mat to change the color composition of my source Mat. Once my source mat is properly shaped, I can use the '*' operator to perform the multiplication. This operator is not available when using a cv::gpu::GpuMat.

My question is how would I format my input source Mat to use cv::gpu::gemm?Can I even use cv::gpu::gemm?

This is the only call that performs matrix multiplication in the OpenCV library from what I can tell. cv::gpu::gemm wants to see a CV_32FC1 , CV_64FC1 type Mat. The type I normally use with the CPU is CV_32FC3.

//sourceMat is CV_32FC3 1920 x 1080 Mat
Mat sourceMat = matFromBuffer(data->bufferA, data->widthA, data->heightA);

//This is the color Matrix
float matrix[3][3] = {{1.057311, -0.204043, 0.055648},
{ 0.041556, 1.875992, -0.969256},
{-0.498535,-1.537150, 3.240479}};

Mat colorMatrixMat = Mat(3, 3, CV_32FC1, matrix).t();

//Color Correct the Mat
Mat linearSourceMat = sourceMat.reshape(1, 1080*1920);
Mat multipliedMatrix = linearSourceMat * colorMatrixMat;
Mat recoloredMat = multipliedMatrix.reshape(3, 1080);

Update: As a test, I created the test routine:

static int gpuTest(){

    float matrix[9] = {1.057311, -0.204043, 0.055648, 0.041556, 1.875992, -0.969256, -0.498535,-1.537150, 3.240479};
    Mat matrixMat = Mat(1, 9, CV_32FC1, matrix).t();
    cv::gpu::GpuMat gpuMatrixMat;
    gpuMatrixMat.upload(matrixMat);

    float matrixDest[9] = {1,1,1,1,1,1,1,1,1};
    Mat matrixDestMat = Mat(1, 9, CV_32FC1, matrixDest).t();
    cv::gpu::GpuMat destMatrixMat;
    destMatrixMat.upload(matrixDestMat);

    cv::gpu::GpuMat nextMat;
    cv::gpu::gemm(gpuMatrixMat, destMatrixMat, 1, cv::gpu::GpuMat(), 0, nextMat);

    return 0;
};

and the error I receive is:

OpenCV Error: Assertion failed (src1Size.width == src2Size.height) in gemm, file /Users/myuser/opencv-2.4.12/modules/gpu/src/arithm.cpp, line 109
libc++abi.dylib: terminating with uncaught exception of type cv::Exception: /Users/myuser/opencv-2.4.12/modules/gpu/src/arithm.cpp:109: error: (-215) src1Size.width == src2Size.height in function gemm

Now how can the src1Size.width be equal to src2Size.height? The width and height are different.

Wouldn't sourceMat.reshape(1, 1080*1920); reduce the channels to 1, therefore making linearSourceMat have type CV_32FC1? Your colorMatrixMat is also CV_32FC1. So it seems to me that gemm should work with your data as is. — Dan Mašek
My mat is CV_32FC3, so it is 1080 columns by 1920 rows of 3 elements RGB. When I reshape it to (1, 1080*1920). I am creating 1 column and 1080*1920 rows of RGB values. — blackirishman
From what I read in the documentation the first argument to Mat::reshape is the number of channels. According to my understanding, this would mean you're creating a single channel matrix with 1080*1920 rows and 3 columns. Since you're multiplying by a 3x3 matrix, this has to be true (according to how matrix multiplication is defined). Check the type in the debugger... — Dan Mašek
Let me see if I can try it out locally. I'm not certain I've got my copy of OpenCV built with the GPU support enabled. — Dan Mašek
I've confirmed my assumptions about the type of the matrix. Can't get the gpu version working at this point, haven't really used that before and it's getting a bit late. — Dan Mašek

Dan Mašek Dan Mašek · Accepted Answer · 2016-03-22T05:10:59

Here's a minimum working example using OpenCV 3.1.

#include <opencv2/opencv.hpp>
#include <opencv2/cudaarithm.hpp>

int main()
{ 
    cv::Mat sourceMat = cv::Mat::ones(1080, 1920, CV_32FC3);

    //This is the color Matrix
    float matrix[3][3] = {
        { 1.057311, -0.204043, 0.055648 }
        , { 0.041556, 1.875992, -0.969256 }
        , { -0.498535, -1.537150, 3.240479 }
        };

    cv::Mat colorMatrixMat = cv::Mat(3, 3, CV_32FC1, matrix).t();

    cv::Mat linearSourceMat = sourceMat.reshape(1, 1080 * 1920);
    cv::Mat multipliedMatrix = linearSourceMat * colorMatrixMat;

    try {
        cv::Mat dummy, gpuMultipliedMatrix;

        // Regular gemm
        cv::gemm(linearSourceMat, colorMatrixMat, 1.0, dummy, 0.0, gpuMultipliedMatrix);
        // CUDA gemm
        // cv::cuda::gemm(linearSourceMat, colorMatrixMat, 1.0, dummy, 0.0, gpuMultipliedMatrix);

        std::cout << (cv::countNonZero(multipliedMatrix != gpuMultipliedMatrix) == 0);
    } catch (cv::Exception& e) {
        std::cerr << e.what();
        return -1;
    }
}

Note that when the beta parameter to gemm(...) is zero, the third input matrix is ignored (based on the code).

Unfortunately I don't have a build of OpenCV compiled with CUBLAS available to try it, but it should work.

Following is somewhat speculative...

To make this work with OpenCV 2.4, you will need to add a little bit more. Before calling gemm(...), you need to create GpuMat objects and upload the data.

cv::gpu::GpuMat gpuLinSrc, gpuColorMat, dummy, gpuResult;
gpuLinSrc.upload(linearSourceMat);
gpuColorMat.upload(colorMatrixMat);

Then...

cv::gpu::gemm(gpuLinSrc, gpuColorMat, 1.0, cv::gpu::GpuMat(), 0.0, gpuResult);

and finally download the data back from the GPU.

cv::Mat resultFromGPU;
gpuResult.download(resultFromGPU);

Update

Here's a more detailed example to show you what's happening:

#include <opencv2/opencv.hpp>

#include <iostream>
#include <numeric>
#include <vector>

// ============================================================================

// Make a 3 channel test image with 5 rows and 4 columns
cv::Mat make_image()
{
    std::vector<float> v(5 * 4);
    std::iota(std::begin(v), std::end(v), 1.0f); // Fill with 1..20
    cv::Mat seq(5, 4, CV_32FC1, v.data()); // 5 rows, 4 columns, 1 channel

    // Create 3 channels, each with different offset, so we can tell them apart
    cv::Mat chans[3] = {
        seq, seq + 100, seq + 200
    };

    cv::Mat merged;
    cv::merge(chans, 3, merged); // 5 rows, 4 columns, 3 channels

    return merged;
}

// Make a transposed color correction matrix.
cv::Mat make_color_mat()
{
    float color_in[3][3] = {
        { 0.1f, 0.2f, 0.3f } // Coefficients for channel 0
        , { 0.4f, 0.5f, 0.6f } // Coefficients for channel 1
        , { 0.7f, 0.8f, 0.9f } // Coefficients for channel 2
    };

    return cv::Mat(3, 3, CV_32FC1, color_in).t();
}

void print_mat(cv::Mat m, std::string const& label)
{
    std::cout << label << ":\n  size=" << m.size()
        << "\n  channels=" << m.channels()
        << "\n" << m << "\n" << std::endl;
}

// Perform matrix multiplication to obtain result point (r,c)
float mm_at(cv::Mat a, cv::Mat b, int r, int c)
{
    return a.at<float>(r, 0) * b.at<float>(0, c)
        + a.at<float>(r, 1) * b.at<float>(1, c)
        + a.at<float>(r, 2) * b.at<float>(2, c);
}

// Perform matrix multiplication to obtain result row r
cv::Vec3f mm_test(cv::Mat a, cv::Mat b, int r)
{
    return cv::Vec3f(
        mm_at(a, b, r, 0)
        , mm_at(a, b, r, 1)
        , mm_at(a, b, r, 2)
        );
}

// ============================================================================

int main()
{ 
    try {
        // Step 1
        cv::Mat source_image(make_image());
        print_mat(source_image, "source_image");
        std::cout << "source pixel at (0,0): " << source_image.at<cv::Vec3f>(0, 0) << "\n\n";

        // Step 2
        cv::Mat color_mat(make_color_mat());
        print_mat(color_mat, "color_mat");

        // Step 3
        // Reshape the source matrix to obtain a matrix:
        // * with only one channel (CV_32FC1)
        // * where each row corresponds to a single pixel from source
        // * where each column corresponds to a single channel from source
        cv::Mat reshaped_image(source_image.reshape(1, source_image.rows * source_image.cols));
        print_mat(reshaped_image, "reshaped_image");

        // Step 4
        cv::Mat corrected_image;
        // corrected_image = 1.0 * reshaped_image * color_mat
        cv::gemm(reshaped_image, color_mat, 1.0, cv::Mat(), 0.0, corrected_image);
        print_mat(corrected_image, "corrected_image");

        // Step 5
        // Reshape back to the original format
        cv::Mat result_image(corrected_image.reshape(3, source_image.rows));
        print_mat(result_image, "result_image");
        std::cout << "result pixel at (0,0): " << result_image.at<cv::Vec3f>(0, 0) << "\n\n";

        // Step 6
        // Calculate one pixel manually...
        std::cout << "check pixel (0,0): " << mm_test(reshaped_image, color_mat, 0) << "\n\n";
    } catch (cv::Exception& e) {
        std::cerr << e.what();
        return -1;
    }
}

// ============================================================================

Step 1

First we create a small test input image:

The image contains 3 channels of float values, i.e. the data type is CV_32FC3. Let's treat the channels as red, green, blue in that order.
The image contains 5 rows of pixels.
The image contains 4 columns of pixels.
Values in each channel are sequential, green = red + 100 and blue = red + 200.

source_image:
  size=[4 x 5]
  channels=3
[1, 101, 201, 2, 102, 202, 3, 103, 203, 4, 104, 204;
 5, 105, 205, 6, 106, 206, 7, 107, 207, 8, 108, 208;
 9, 109, 209, 10, 110, 210, 11, 111, 211, 12, 112, 212;
 13, 113, 213, 14, 114, 214, 15, 115, 215, 16, 116, 216;
 17, 117, 217, 18, 118, 218, 19, 119, 219, 20, 120, 220]

We can print out a single pixel, to make the structure clearer:

source pixel at (0,0): [1, 101, 201]

Step 2

Create a sample colour correction matrix (transposed) such that:

First column contains coefficients used to determine the red value
Second column contains coefficients used to determine the green value
Third column contains coefficients used to determine the blue value

color_mat:
  size=[3 x 3]
  channels=1
[0.1, 0.40000001, 0.69999999;
 0.2, 0.5, 0.80000001;
 0.30000001, 0.60000002, 0.89999998]

Sidenote: Color Correction Algorithm

We want to transform source pixel S to pixel T using coefficients C

S = [ sr, sg, sb ]
T = [ tr, tg, tb ]
C = [ cr1, cr2, cr3;
      cg1, cg2, cg3;
      cb1, cb2, cb3]

Such that

Tr = cr1 * sr + cr2 * sg + cr3 * sb
Tg = cg1 * sr + cg2 * sg + cg3 * sb
Tb = cb1 * sr + cb2 * sg + cb3 * sb

Which can be represented by the following matrix expression

T = S * C_transpose

Step 3

In order to be able to use the above algorithm, we first need to reshape our image into a matrix that:

Contains a single channel, so that value at each point is just a float
Has one pixel per row.
Has 3 columns representing red, green, blue

In this shape, matrix multiplication will mean that each pixel/row from input gets multiplied by the coefficient matrix to determine one pixel/row in the output.

The reshaped matrix looks as follows:

reshaped_image:
  size=[3 x 20]
  channels=1
[1, 101, 201;
 2, 102, 202;
 3, 103, 203;
 4, 104, 204;
 5, 105, 205;
 6, 106, 206;
 7, 107, 207;
 8, 108, 208;
 9, 109, 209;
 10, 110, 210;
 11, 111, 211;
 12, 112, 212;
 13, 113, 213;
 14, 114, 214;
 15, 115, 215;
 16, 116, 216;
 17, 117, 217;
 18, 118, 218;
 19, 119, 219;
 20, 120, 220]

Step 4

We perform the multiplication, for example using gemm, to get the following matrix:

corrected_image:
  size=[3 x 20]
  channels=1
[80.600006, 171.5, 262.39999;
 81.200005, 173, 264.79999;
 81.800003, 174.5, 267.20001;
 82.400002, 176, 269.60001;
 83, 177.5, 272;
 83.600006, 179, 274.39999;
 84.200005, 180.5, 276.79999;
 84.800003, 182, 279.20001;
 85.400002, 183.5, 281.60001;
 86, 185, 284;
 86.600006, 186.5, 286.39999;
 87.200005, 188, 288.79999;
 87.800003, 189.5, 291.20001;
 88.400009, 191, 293.60001;
 89, 192.5, 296;
 89.600006, 194, 298.39999;
 90.200005, 195.50002, 300.79999;
 90.800003, 197, 303.20001;
 91.400009, 198.5, 305.60001;
 92, 200, 308]

Step 5

Now we can reshape the image back to the original shape. The result is

result_image:
  size=[4 x 5]
  channels=3
[80.600006, 171.5, 262.39999, 81.200005, 173, 264.79999, 81.800003, 174.5, 267.20001, 82.400002, 176, 269.60001;
 83, 177.5, 272, 83.600006, 179, 274.39999, 84.200005, 180.5, 276.79999, 84.800003, 182, 279.20001;
 85.400002, 183.5, 281.60001, 86, 185, 284, 86.600006, 186.5, 286.39999, 87.200005, 188, 288.79999;
 87.800003, 189.5, 291.20001, 88.400009, 191, 293.60001, 89, 192.5, 296, 89.600006, 194, 298.39999;
 90.200005, 195.50002, 300.79999, 90.800003, 197, 303.20001, 91.400009, 198.5, 305.60001, 92, 200, 308]

Let's have a look at one pixel from the result:

result pixel at (0,0): [80.6, 171.5, 262.4]

Step 6

Now we can double check our result by performing the appropriate calculations manually (functions mm_test and mm_at).

check pixel (0,0): [80.6, 171.5, 262.4]