2
votes

I have a very simple function written in Julia that needs to be run millions of times.

Code:

function randfunc()
  #rands could be a global variable if desired
  rands = rand(200,100000)
  for i=1:100000
    #the alphabet can continue (and in Q1 as well)
    # rand(1:100000) generates a random number within the number of columns in rands
    a = rand(1:100000) 
    b = rand(1:100000)
    c = rand(1:100000)
    d = rand(1:100000)
    e = rand(1:100000)
    Q1 = hcat(rands[:,a],rands[:,b],rands[:,c],rands[:,d],rands[:,e])
    Q2 = *(Q1.',Q1)
  end
end

Is there any way to speed up the hcat function or replace it with something more efficient?

Another way to speed up this function would be to manually do the matrix multiplication without constructing the Q1 matrix, but the built-in *(,) operator runs faster than using +'s and *'s, at least on my attempt, and doing this seems to have more overhead than simply constructing Q1.

Using rands = convert(Array{Float32}, rands) helps a bit, but Float16 is actually worse (especially for matrix multiplication). The elements in rands cannot be strictly integers, and the number of column vectors is arbitrary in Q1.

Edit: The initial concept of this question was to try to obtain a quick way of calling a matrix from data that would later be part of a matrix multiplication with its transpose. I have edited the code to try to address any ambiguity.

Old Code:

function randfunc()
  #rands could be a global variable if desired
  rands = rand(200,100000)
  for i=1:100000-1
    Q1 = hcat(rands[:,[i]],rands[:,[i]].*rands[:,[i+1]])
    Q2 = *(Q1.',Q1)
  end
end
2
What does the function return? (it seems to be nothing) - Dan Getz
I cut off the code because everything after that was pretty well optimized. If you need the function to return something, you can have it return det(Q2). - evoclue
Which Q2? There are 100000-1 different Q2 calculated in the loop? - Dan Getz
All of them. The Q2s can be written into a pre-allocated vector. - evoclue
Posted an answer below. If you verify the correctness of the calculated values it would be nice to know. - Dan Getz

2 Answers

5
votes

This code tries to optimize the function by simplifying the calculation into two nested for loops. Basically, this calculates the same expression the original matrix form does, by expanding the expressions in terms of the input rands matrix elements. The return value is a vector of the determinants of the Q2 matrices.

function randfunc2(rands)
  a,b,c = 0.0,0.0,0.0
  detvec = Vector{Float64}(size(rands,2)-1)
  @inbounds for i=1:(size(rands,2)-1)
    a,b,c = 0.0,0.0,0.0
    @inbounds for j=1:size(rands,1)
      s = rands[j,i]
      t = s*s
      u = s*rands[j,i+1]
      a += t
      b += s*u
      c += u*u
    end
    detvec[i] = a*c-b*b    # calc determinant using 2x2 det formula
    # Q2 = [a b; b c]
  end
  return detvec
end

To use this function, as in the example in the question:

rands = rand(200,100000)
randfunc2(rands)        # this returns the determinant vectors.

The allocations are very minimal and each matrix element is accessed twice but in column order (which is the fast order).

3
votes

Breaking your code into smaller pieces + using views helped bring down allocations about 60%:

function randfunc()
    #rands could be a global variable if desired
    n=100000
    rands = rand(200,n)
    for i=1:n-1
        Q1 = hcat(rands[:,[i]],rands[:,[i]].*rands[:,[i+1]])
        Q2 = *(Q1.',Q1)
    end
end

function Q1(r1,r2)
   hcat(r1,r1.*r2) 
end
function some_mult(Q)
    *(Q',Q)
end
function randfunc2()
    n=100000
    #rands could be a global variable if desired
    rands = rand(200,n)
    for i=1:n-1
        Q2 = some_mult(Q1(
            view(rands,:,[i]),
            view(rands,:,[i+1])))
    end
end
@time randfunc()  #1.883301 seconds (3.20 M allocations: 1.228 GB, 30.84% gc time)
@time randfunc2() #1.656203 seconds (2.90 M allocations: 736.978 MB, 31.27% gc time)