Optimizing matrix multiplication with varying sizes

Question

Suppose I have the following data generating process

using Random
using StatsBase

m_1    = [1.0 2.0]
m_2    = [1.0 2.0; 3.0 4.0]
DD     = []
y      = zeros(2,200)

for i in 1:100
    rand!(m_1)
    rand!(m_2)
    push!(DD, m_1)
    push!(DD, m_2)
end

idxs   = sample(1:200,10)
for i in idxs
    DD[i] = DD[1]
end

and suppose given the data, I have the following function


function test(y, DD, n)
    v_1   = [1 2]
    v_2   = [3 4]
    for j in 1:n
        for i in 1:size(DD,1)
            if size(DD[i],1) == 1
                y[1:size(DD[i],1),i] .= (v_1 * DD[i]')[1]
            else
                y[1:size(DD[i],1),i] = (v_2 * DD[i]')'
            end
        end
    end
end

I'm struggling to optimize the speed of test. In particular, memory allocation increases as I increase n. However, I'm not really allocating anything new.

The data generating process captures the fact that I don't know for sure the size of DD[i] beforehand. That is, the first time I call test, DD[1] could be a 2x2 matrix. The second time I call test, DD[1] could be a 1x2 matrix. I think this could be part of the issue with memory allocation: Julia doesn't know the sizes beforehand.

I'm completely stuck. I've tried @inbounds but that didn't help. Is there a way to improve this?

It is not really clear to me what your intention is in this code. A clarification could perhaps make it easier to answer. Furthermore, the calls to rand! probably don't do what you intend. Since they modify their argument in place you are not creating new arrays and instead, you keep changing the old ones in place even when they are part of DD. This means that DD will consist of the same two arrays m_1 and m_2 repeated and shuffled. Do display(DD) and you'll see what I mean. — ahnlabb

ahnlabb ahnlabb · Accepted Answer · 2021-03-04T08:57:15

One important thing to check for performance is that Julia can understand the types. You can check this by running @code_warntype test(y, DD, 1), the output will make it clear that DD is of type Any[] (since you declared it that way). Working with Any can incur quite a performance penalty so declaring DD = Matrix{Float64}[] cuts the time to a third in my testing.

I'm not sure how close this example is to the actual code you want to write but in this particular case the size(DD[i],1) == 1 branch can be replaced by a call to LinearAlgebra.dot:

y[1:size(DD[i],1),i] .= dot(v_1, DD[i])

this cuts the time by another 50% for me. Finally you can squeeze out just a tiny bit more by using mul! to perform the other multiplication in place:

mul!(view(y, 1:size(DD[i],1),i:i), DD[i], v_2')

Full example:

using Random
using LinearAlgebra

DD = [rand(i,2) for _ in 1:100 for i in 1:2]
y  = zeros(2,200)
shuffle!(DD)

function test(y, DD, n)
    v_1   = [1 2]
    v_2   = [3 4]'
    for j in 1:n
        for i in 1:size(DD,1)
            if size(DD[i],1) == 1
                y[1:size(DD[i],1),i] .= dot(v_1, DD[i])
            else
                mul!(view(y, 1:size(DD[i],1),i:i), DD[i], v_2)
            end
        end
    end
end

Optimizing matrix multiplication with varying sizes

1 Answers