MATLAB: Create smaller matrix without “side NaN” columns

Question

I'm working with a 30*26000 size matrix that has NaNs at the beginning and at the end. NaNs are also sprinkled throughout each row. I can fill in the NaNs with linear interpolation but that will leave NaNs at the beginning and end of each row. Extrapolating to replace these NaNs at the ends is not ideal for my data set.

I want to just trim the matrix. Take for example a 3 by 6 matrix:

NaN NaN 1 2  3  NaN
NaN  1  2 3 NaN NaN
 1  NaN 2 3  4   5

Cut off the left most and right most columns such that no row begins or ends with a NaN.

1 2
2 3
2 3

So we are left with a 3 by 2 matrix.

How can I do this in Matlab? (speed-optimized; I will need to apply this to a million size matrix)

Thanks!

Colin T Bowers Colin T Bowers · Accepted Answer · 2012-11-22T02:01:14

Firstly, the vectorized solution of argyris will work perfectly well (+1). I'm only posting this because you emphasized that you wanted a speed optimized solution. Well, the downside of argyris solution is that the sum and isnan operation are performed on the entire matrix. This will be optimal if you have to come a long way in on either side to find the first non-NaN column. But what if you don't? A loop-based solution that exploits the fact that you may only need to come in a few columns may do better (particularly given how good the JIT accelerator is getting at executing single loops quickly). I've put together a speed test that includes both argyris and my solution:

%#Set up an example case using the matrix size you indicated in the question
T = 30;
N = 26000;
X = rand(T, N);
TrueL = 8;
TrueR = N - 8;
X(:, 1:TrueL) = NaN;
X(:, TrueR:end) = NaN;

%#argyris solution
tic
I1 = sum(isnan(X));
argL = find(I1 == 0, 1, 'first');
argR = find(I1 == 0, 1, 'last');
Soln1 = X(:, argL:argR);
toc

%#My loop based solution (faster if TrueL and TrueR are small)
tic
for n = 1:N
    if ~any(isnan(X(:, n)))
        break
    end
end
ColinL = n;
for n = N:-1:1
    if ~any(isnan(X(:, n)))
        break
    end
end
ColinR = n;
Soln2 = X(:, ColinL:ColinR);
toc

In the above example, the solution will need to get rid of the first 8 and last 8 columns. The outcome of the speed test?

Elapsed time is 0.002919 seconds. %#argyris solution
Elapsed time is 0.001007 seconds. %#My solution

The loop based solution is almost 3 times faster. Okay, now let's up the number of columns that we need to get rid of on either side to 100:

Elapsed time is 0.002769 seconds. %#argyris solution
Elapsed time is 0.001999 seconds. %#My solution

Still ahead. What about 1000 columns on either side?

Elapsed time is 0.003597 seconds. %#argyris solution
Elapsed time is 0.003719 seconds. %#My solution

So we've found our tipping point (on my machine at least - Quad core i7, Linux Mint v12, Matlab R2012b). Once we need to come in about 1000 columns on either side, we're better off using the vectorized solution.

One final note of CAUTION: If the routine is occurring inside another (possibly unrelated) loop, then speed comparisons should be re-done. This is because my solution will now involve a double loop. Even if the loops are unrelated, the JIT accelerator is not so good with double loops. I did some quick tests on my machine, and my solution still comes out ahead for small TrueL and TrueR (ie less than 100), but the advantage is not as large as it was when the outer loop was not present.

Anyway, hope this proves useful to you or anyone else who comes a-reading.

Cheers!

EDIT: I've done a few speed tests incorporating angainor's very neat one-liner (+1). It performs almost as well as my loop based solution when the number of columns to be removed is small. Suprisingly, it didn't scale that well when the number of columns to be removed is large, unlike argyris's solution. That may have something to do with the computer I'm on now though: work Windows machine - I've never really trusted it fully :-)

MATLAB: Create smaller matrix without “side NaN” columns

3 Answers