10
votes

I am working with Matlab.

I have a binary square matrix. For each row, there is one or more entries of 1. I want to go through each row of this matrix and return the index of those 1s and store them in the entry of a cell.

I was wondering if there is a way to do this without looping over all the rows of this matrix, as for loop is really slow in Matlab.

For example, my matrix

M = 0 1 0
    1 0 1
    1 1 1 

Then eventually, I want something like

A = [2]
    [1,3]
    [1,2,3]

So A is a cell.

Is there a way to achieve this goal without using for loop, with the aim of calculating the result more quickly?

5
Do you want the result to be fast or do you want the result to avoid for loops? For this problem, with modern versions of MATLAB, I strongly suspect a for loop to be the fastest solution. If you have a performance problem I suspect you're looking in the wrong place for the solution based on outdated advice.Will
@Will I want the results to be fast. My matrix is very big. The run time is about 30s in my computer by using for loop. I want to know if there are some clever vectorization operations or, mapReduce, etc that can increase the speed.ftxx
I suspect, you can't. Vectorization works on accurately described vectors and matrices, but your result allows for vectors of different lengths. Thus, my assumption is, that you'll always have some explicit loop or some loop-in-disguise like cellfun.HansHirse
@ftxx how big? And how many 1s in a typical row? I wouldn't expect a find loop to take anything close to 30s for anything small enough to fit on physical memory.Will
@ftxx Please see my updated answer, I've edited since it was accepted with a minor performance improvementWolfie

5 Answers

11
votes

At the bottom of this answer is some benchmarking code, since you clarified that you're interested in performance rather than arbitrarily avoiding for loops.

In fact, I think for loops are probably the most performant option here. Since the "new" (2015b) JIT engine was introduced (source) for loops are not inherently slow - in fact they are optimised internally.

You can see from the benchmark that the mat2cell option offered by ThomasIsCoding here is very slow...

Comparison 1

If we get rid of that line to make the scale clearer, then my splitapply method is fairly slow, obchardon's accumarray option is a bit better, but the fastest (and comparable) options are either using arrayfun (as also suggested by Thomas) or a for loop. Note that arrayfun is basically a for loop in disguise for most use-cases, so this isn't a surprising tie!

Comparison 2

I would recommend you use a for loop for increased code readability and the best performance.

Edit:

If we assume that looping is the fastest approach, we can make some optimisations around the find command.

Specifically

  • Make M logical. As the below plot shows, this can be faster for relatively small M, but slower with the trade-off of type conversion for large M.

  • Use a logical M to index an array 1:size(M,2) instead of using find. This avoids the slowest part of the loop (the find command) and outweighs the type conversion overhead, making it the quickest option.

Here is my recommendation for best performance:

function A = f_forlooplogicalindexing( M )
    M = logical(M);
    k = 1:size(M,2);
    N = size(M,1);
    A = cell(N,1);
    for r = 1:N
        A{r} = k(M(r,:));
    end
end

I've added this to the benchmark below, here is the comparison of loop-style approaches:

Comparison 3

Benchmarking code:

rng(904); % Gives OP example for randi([0,1],3)
p = 2:12; 
T = NaN( numel(p), 7 );
for ii = p
    N = 2^ii;
    M = randi([0,1],N);

    fprintf( 'N = 2^%.0f = %.0f\n', log2(N), N );

    f1 = @()f_arrayfun( M );
    f2 = @()f_mat2cell( M );
    f3 = @()f_accumarray( M );
    f4 = @()f_splitapply( M );
    f5 = @()f_forloop( M );
    f6 = @()f_forlooplogical( M );
    f7 = @()f_forlooplogicalindexing( M );

    T(ii, 1) = timeit( f1 ); 
    T(ii, 2) = timeit( f2 ); 
    T(ii, 3) = timeit( f3 ); 
    T(ii, 4) = timeit( f4 );  
    T(ii, 5) = timeit( f5 );
    T(ii, 6) = timeit( f6 );
    T(ii, 7) = timeit( f7 );
end

plot( (2.^p).', T(2:end,:) );
legend( {'arrayfun','mat2cell','accumarray','splitapply','for loop',...
         'for loop logical', 'for loop logical + indexing'} );
grid on;
xlabel( 'N, where M = random N*N matrix of 1 or 0' );
ylabel( 'Execution time (s)' );

disp( 'Done' );

function A = f_arrayfun( M )
    A = arrayfun(@(r) find(M(r,:)),1:size(M,1),'UniformOutput',false);
end
function A = f_mat2cell( M )
    [i,j] = find(M.');
    A = mat2cell(i,arrayfun(@(r) sum(j==r),min(j):max(j)));
end
function A = f_accumarray( M )
    [val,ind] = ind2sub(size(M),find(M.'));
    A = accumarray(ind,val,[],@(x) {x});
end
function A = f_splitapply( M )
    [r,c] = find(M);
    A = splitapply( @(x) {x}, c, r );
end
function A = f_forloop( M )
    N = size(M,1);
    A = cell(N,1);
    for r = 1:N
        A{r} = find(M(r,:));
    end
end
function A = f_forlooplogical( M )
    M = logical(M);
    N = size(M,1);
    A = cell(N,1);
    for r = 1:N
        A{r} = find(M(r,:));
    end
end
function A = f_forlooplogicalindexing( M )
    M = logical(M);
    k = 1:size(M,2);
    N = size(M,1);
    A = cell(N,1);
    for r = 1:N
        A{r} = k(M(r,:));
    end
end
2
votes

You can try arrayfun like below, which sweep through rows of M

A = arrayfun(@(r) find(M(r,:)),1:size(M,1),'UniformOutput',false)

A =
{
  [1,1] =  2
  [1,2] =

     1   3

  [1,3] =

     1   2   3

}

or (a slower approach by mat2cell)

[i,j] = find(M.');
A = mat2cell(i,arrayfun(@(r) sum(j==r),min(j):max(j)))

A =
{
  [1,1] =  2
  [2,1] =

     1
     3

  [3,1] =

     1
     2
     3

}
2
votes

Edit: I added a benchmark, the results show that a for loop is more efficient than accumarray.


You can usefind and accumarray:

[c, r] = find(A');
C = accumarray(r, c, [], @(v) {v'});

The matrix is transposed (A') because find groups by column.

Example:

A = [1 0 0 1 0
     0 1 0 0 0
     0 0 1 1 0
     1 0 1 0 1];

%  Find nonzero rows and colums
[c, r] = find(A');

%  Group row indices for each columns
C = accumarray(r, c, [], @(v) {v'});

% Display cell array contents
celldisp(C)

Output:

C{1} = 
     1     4

C{2} = 
     2

C{3} =
     3     4

C{4} = 
     1     3     5

Benchmark:

m = 10000;
n = 10000;

A = randi([0 1], m,n);

disp('accumarray:')
tic
[c, r] = find(A');
C = accumarray(r, c, [], @(v) {v'});
toc
disp(' ')

disp('For loop:')
tic
C = cell([size(A,1) 1]);
for i = 1:size(A,1)
    C{i} = find(A(i,:));
end
toc

Result:

accumarray:
Elapsed time is 2.407773 seconds.

For loop:
Elapsed time is 1.671387 seconds.

A for loop is more efficient than accumarray...

2
votes

Using accumarray:

M = [0 1 0
     1 0 1
     1 1 1];

[val,ind] = find(M.');

A = accumarray(ind,val,[],@(x) {x});
2
votes

You can use strfind :

A = strfind(cellstr(char(M)), char(1));