split a matrix according to a column with matlab.

3

votes

A = [1,4,2,5,10
     2,4,5,6,2
     2,1,5,6,10
     2,3,5,4,2]

And I want split it into two matrix by the last column A ->B and C

B =  [1,4,2,5,10
      2,1,5,6,10]
C = [2,4,5,6,2
     2,3,5,4,2]

Also, this method could be applied to a big matrix, like matrix 100*22 according to the last column value into 9 groups by matlab.

matlabmatrixsplit

Can you explain more about what you're trying to achieve? – Eitan T

5

votes

Use logical indexing

B=A(A(:,end)==10,:);
C=A(A(:,end)==2,:);

returns

>> B
B =
     1     4     2     5    10
     2     1     5     6    10

>> C
C =
     2     4     5     6     2
     2     3     5     4     2

EDIT: In reply to Dan's comment here is the extension for general case

e = unique(A(:,end));
B = cell(size(e));
for k = 1:numel(e)
    B{k} = A(A(:,end)==e(k),:);
end

or more compact way

B=arrayfun(@(x) A(A(:,end)==x,:), unique(A(:,end)), 'UniformOutput', false);

so for

A =
     1     4     2     5    10
     2     4     5     6     2
     2     1     5     6    10
     2     3     5     4     2
     0     3     1     4     9
     1     3     4     5     1
     1     0     4     5     9
     1     2     4     3     1

you get the matrices in elements of cell array B

>> B{1}
ans =
     1     3     4     5     1
     1     2     4     3     1

>> B{2}
ans =
     2     4     5     6     2
     2     3     5     4     2

>> B{3}
ans =
     0     3     1     4     9
     1     0     4     5     9

>> B{4}
ans =
     1     4     2     5    10
     2     1     5     6    10

3

votes

Here is a general approach which will work on any number of numbers in the last column on any sized matrix:

A = [1,4,2,5,10
     2,4,5,6,2
     1,1,1,1,1
     2,1,5,6,10
     2,3,5,4,2
     0,0,0,0,2];

First sort by the last column (many ways to do this, don't know if this is the best or not)

[~, order] = sort(A(:,end));
As = A(order,:);

Then create a vector of how many rows of the same number appear in that last col (i.e. how many rows per group)

rowDist = diff(find([1; diff(As(:, end)); 1]));

Note that for my example data rowDist will equal [1 3 2] as there is 1 1, 3 2s and 2 10s. Now use mat2cell to split by these row groupings:

Ac = mat2cell(As, rowDist);

If you really want to you can now split it into separate matrices (but I doubt you would)

Ac{:}

results in

ans =

   1   1   1   1   1

ans =

   0   0   0   0   2
   2   3   5   4   2
   2   4   5   6   2

ans =

    1    4    2    5   10
    2    1    5    6   10

But I think you would find Ac itself more useful

EDIT:

Many solutions so might as well do a time comparison:

A = [...
     1     4     2     5    10
     2     4     5     6     2
     2     1     5     6    10
     2     3     5     4     2
     0     3     1     4     9
     1     3     4     5     3
     1     0     4     5     9
     1     2     4     3     1];

A = repmat(A, 1000, 1);

tic
for l = 1:100
  [~, y] = sort(A(:,end));
  As = A(y,:);
  rowDist = diff(find([1; diff(As(:, end)); 1]));
  Ac = mat2cell(As, rowDist);
end
toc

tic
for l = 1:100
  D=arrayfun(@(x) A(A(:,end)==x,:), unique(A(:,end)), 'UniformOutput', false);
end
toc

tic
for l = 1:100
  for k = 1:numel(e)
      B{k} = A(A(:,end)==e(k),:);
  end
end
toc

tic
for l = 1:100
  Bb = sort(A(:,end)); 
  [~,b] = histc(A(:,end), Bb([diff(Bb)>0;true]));
  C = accumarray(b, (1:size(A,1))', [], @(r) {A(r,:)} );
end
toc

resulted in

Elapsed time is 0.053452 seconds.
Elapsed time is 0.17017 seconds.
Elapsed time is 0.004081 seconds.
Elapsed time is 0.22069 seconds.

So for even for a large matrix the loop method is still the fastest.

1

votes

Use accumarray in combination with histc:

% Example data (from Mohsen Nosratinia)
A = [...
     1     4     2     5    10
     2     4     5     6     2
     2     1     5     6    10
     2     3     5     4     2
     0     3     1     4     9
     1     3     4     5     1
     1     0     4     5     9
     1     2     4     3     1];

% Get the proper indices to the specific rows
B = sort(A(:,end)); 
[~,b] = histc(A(:,end), B([diff(B)>0;true]));

% Collect all specific rows in their specific groups
C = accumarray(b, (1:size(A,1))', [], @(r) {A(r,:)} );

Results:

>> C{:}
ans =
     1     3     4     5     1
     1     2     4     3     1
ans =
     2     3     5     4     2
     2     4     5     6     2
ans =
     0     3     1     4     9
     1     0     4     5     9
ans =
     2     1     5     6    10
     1     4     2     5    10

Note that

B = sort(A(:,end)); 
[~,b] = histc(A(:,end), B([diff(B)>0;true]));

can also be written as

[~,b] = histc(A(:,end), unique(A(:,end)));

but unique is not built-in and is therefore likely to be slower, especially when this is all used in a loop.

Note also that the order of the rows has changed w.r.t. the order they had in the original matrix. If the order matters, you'll have to throw in another sort:

C = accumarray(b, (1:size(A,1))', [], @(r) {A(sort(r),:)} );

split a matrix according to a column with matlab.

3 Answers