What is the most efficient way to construct large block matrices in Mathematica?

Question

Inspired by Mike Bantegui's question on constructing a matrix defined as a recurrence relation, I wonder if there is any general guidance that could be given on setting up large block matrices in the least computation time. In my experience, constructing the blocks and then putting them together can be quite inefficient (thus my answer was actually slower than Mike's original code). Join and possibly ArrayFlatten are possibly less efficient than they could be.

Obviously if the matrix is sparse, one can use SparseMatrix constructs, but there will be times when the block matrix you are constructing is not sparse.

What is best practice for this kind of problem? I am assuming the elements of the matrix are numeric.

just using Do loops and compiling can be quite fast, especially in version 8 which gives the option of compiling to C. But I have little experience of this so can't offer general opinions — acl
I have to agree with acl here. The naive Do loop was actually faster than the functional version, especially for large matrix sizes. When compiled, the Do loop became significantly faster. One interesting thing was the fact that something along the lines of a + b, where a and b were matrices, was really slow. — Mike Bailey
Yes, this is why I posed the question. Long-time Mathematica users have all been brought up to "say no to loops", and the idea that functional programming styles were more efficient in Mathematica. This was the stock response when people complained that Mma was slower than Matlab. But if things have changed over recent versions, I thought it worthwhile documenting that. — Verbeia
@Verbeia: I agree. My answer actually has some rather interesting examples of how this plays out. None of the examples I used below were arbitrary toy examples. All of them are used in my production code for some rather long and expensive calculations. — Mike Bailey
I discussed one example that might be of interest to you in my book, here: mathprogramming-intro.org/book/node553.html (The introduction is alas missing in the web version, but it is present in the pdf). I was generating block random matrices with a certain symmetry, and in that example I compared different techniques for doing that. The problem however was a bit different: each matrix in my case was not necessarily very large, but I had to generate a large number of them. Also, I did it in M5-M6, long before the things like compilation to C became available. — Leonid Shifrin

Mike Bailey Mike Bailey · Accepted Answer · 2011-07-29T00:54:08

The code shown below is available here: http://pastebin.com/4PWWxGhB. Just copy and paste it into a notebook to test it out.

I was actually trying to do several functional ways of calculating matrices, since I figured the functional way (which is typically idiomatic in Mathematica) is more efficient.

As one example, I had this matrix which was composed of two lists:

In: L = 1200;
e = Table[..., {2L}];
f = Table[..., {2L}];

h = Table[0, {2L}, {2L}];
Do[h[[i, i]] = e[[i]], {i, 1, L}];
Do[h[[i, i]] = e[[i-L]], {i, L+1, 2L}];
Do[h[[i, j]] = f[[i]]f[[j-L]], {i, 1, L}, {j, L+1, 2L}];
Do[h[[i, j]] = h[[j, i]], {i, 1, 2 L}, {j, 1, i}];

My first step was to time everything.

In: h = Table[0, {2 L}, {2 L}];
AbsoluteTiming[Do[h[[i, i]] = e[[i]], {i, 1, L}];]
AbsoluteTiming[Do[h[[i, i]] = e[[i - L]], {i, L + 1, 2 L}];]
AbsoluteTiming[
 Do[h[[i, j]] = f[[i]] f[[j - L]], {i, 1, L}, {j, L + 1, 2 L}];]
AbsoluteTiming[Do[h[[i, j]] = h[[j, i]], {i, 1, 2 L}, {j, 1, i}];]

Out: {0.0020001, Null}
{0.0030002, Null}
{5.0012861, Null}
{4.0622324, Null}

DiagonalMatrix[...] was slower than the do loops, so I decided to just use Do loops on the last step. As you can see, using Outer[Times, f, f] was much faster in this case.

I then wrote the equivalent using Outer for the blocks in the upper right and bottom left of the matrix, and DiagonalMatrix for the diagonal:

AbsoluteTiming[h1 = ArrayPad[Outer[Times, f, f], {{0, L}, {L, 0}}];]
AbsoluteTiming[h1 += Transpose[h1];]
AbsoluteTiming[h1 += DiagonalMatrix[Join[e, e]];]


Out: {0.9960570, Null}
{0.3770216, Null}
{0.0160009, Null}

The DiagonalMatrix was actually slower. I could replace this with just the Do loops, but I kept it because it was cleaner looking.

The current tally is 9.06 seconds for the naive Do loop, and 1.389 seconds for my next version using Outer and DiagonalMatrix. About a 6.5 times speedup, not too bad.

Sounds a lot faster, now doesn't it? Let's try using Compile now.

In: cf = Compile[{{L, _Integer}, {e, _Real, 1}, {f, _Real, 1}},
   Module[{h},
    h = Table[0.0, {2 L}, {2 L}];
    Do[h[[i, i]] = e[[i]], {i, 1, L}];
    Do[h[[i, i]] = e[[i - L]], {i, L + 1, 2 L}];
    Do[h[[i, j]] = f[[i]] f[[j - L]], {i, 1, L}, {j, L + 1, 2 L}];
    Do[h[[i, j]] = h[[j, i]], {i, 1, 2 L}, {j, 1, i}];
    h]];

AbsoluteTiming[cf[L, e, f];]

Out: {0.3940225, Null}

Now it's running 3.56 times faster than my last version, and 23.23 times faster than the first one. Next version:

In: cf = Compile[{{L, _Integer}, {e, _Real, 1}, {f, _Real, 1}},
   Module[{h},
    h = Table[0.0, {2 L}, {2 L}];
    Do[h[[i, i]] = e[[i]], {i, 1, L}];
    Do[h[[i, i]] = e[[i - L]], {i, L + 1, 2 L}];
    Do[h[[i, j]] = f[[i]] f[[j - L]], {i, 1, L}, {j, L + 1, 2 L}];
    Do[h[[i, j]] = h[[j, i]], {i, 1, 2 L}, {j, 1, i}];
    h], CompilationTarget->"C", RuntimeOptions->"Speed"];

AbsoluteTiming[cf[L, e, f];]

Out: {0.1370079, Null}

Most of the speed came from CompilationTarget->"C". Here I got another 2.84 speedup over the fastest version, and 66.13 times speedup over the first version. But all I did was just compile it!

Now, this is a very simple example. But this is real code I'm using to solve a problem in condensed matter physics. So don't dismiss it as possibly being a "toy example."

How's about another example of a technique we can use? I have a relatively simple matrix I have to build up. I have a matrix that's composed of nothing but ones from the start to some arbitrary point. The naive way may look something like this:

In: k = L;
AbsoluteTiming[p = Table[If[i == j && j <= k, 1, 0], {i, 2L}, {j, 2L}];]
Out: {5.5393168, Null}

Instead, let's build it up using ArrayPad and IdentityMatrix:

In: AbsoluteTiming[ArrayPad[IdentityMatrix[k], {{0, 2L-k}, {0, 2L-k}}
Out: {0.0140008, Null}

This actually doesn't work for k = 0, but you can special case that if you need that. Furthermore, depending on the size of k, this can be faster or slower. It's always faster than the Table[...] version though.

You could even write this using SparseArray:

In: AbsoluteTiming[SparseArray[{i_, i_} /; i <= k -> 1, {2 L, 2 L}];]
Out: {0.0040002, Null}

I could go on about some other things, but I'm afraid if I do I'll make this answer unreasonably large. I've accumulated a number of techniques for forming these various matrices and lists in the time I spent trying to optimize some code. The base code I worked with took over 6 days for one calculation to run, and now it takes only 6 hours to do the same thing.

I'll see if I can pick out the general techniques I've come up with and just stick them in a notebook to use.

TL;DR: It seems like for these cases, the functional way outperforms the procedural way. But when compiled, the procedural code outperforms the functional code.

What is the most efficient way to construct large block matrices in Mathematica?

2 Answers