5
votes

Appending rows/columns to a matrix in MATLAB seems quite something to consider. For example, when I tried appending a column to a matrix A which has many rows and many columns, like

A = [A, added_col]

Matlab will warn me that since this has to make a copy of A in the memory, I'd better use pre-allocation for speed. This is understandable, because the underlying data in A occupies a contiguous block of memory.

My question is, will removing rows/columns cause similar issues? For example, to remove the second row of A:

A(2,:) = []

Is this operation in-place or not? I really feel unsure, since for one thing it doesn't seem to make any new room for data in the memory, and for another, the rows of A will be stored non-contiguously (since row 2 is removed).

So what will happen internally? And is this operation efficient enough to use in practice? Thanks!


Just tested it with a complexity of 100000:

clc; clear;
N = 100000;
A = zeros(N, 3);

t1 = tic;
for ii = 1:N
    A(ii, :) = [1 2 3];
end
t2 = toc;

And

clc; clear;
N = 100000;
A = zeros(N, 3);

t1 = tic;
for ii = (N-1):-1:2
    A(ii, :) = [];
end
t2 = toc;

Results: 0.009s for the first (modifying preallocated matrix) and 53.429s for the second (removing rows from a matrix). I think this basically settles this question: NO, removing rows/columns from a matrix is NOT efficient to use, since it definitely involves deepcopying data and reallocating memory.

Also, removing columns instead of rows isn't a good idea either. As I tested, it still takes as good as approx two minutes, on the above complexity scale:

N = 100000;
test_m = zeros(3, N);
tic
for ii = (N - 1):-1:2
    test_m(:, ii) = [];
end
toc
% result: 105.436595 seconds. 
% This was run on a different machine than the previous examples.
% But is still enough evidence that dynamically resizing a big matrix is a BAD idea.

So, end of the story: Don't try to remove columns or rows in this way, unless you have a really small matrix. For bulky matrices, always use preallocation instead.

1
Since MATLAB stores data in a column-major (columnwise) numbering scheme, removing columns should be more efficient.rahnema1
@rahnema1 thanks. But reallocation of memory is inevitable is it?Vim
Not necessarily ! It may use different approaches based on type of indexing.For example removing the first column only needs single call of memmove without reallocation. Or in a more optimized way it may not need memmove and use an offset to represent the start of data.rahnema1
@rahnema1 unfortunately, as I tested it out, this is not the case. It took as good as at least dozens of seconds to perform the column removal.Vim
It is better to provide results of your test. For example one test that removes the first column and other test that removes the last column and the same test for the first and the last row.rahnema1

1 Answers

2
votes

There are a few issues here:

  1. Whether memory gets allocated for the deletion
  2. Row major vs column major memory access
  3. The code examples
  4. preallocating modifications vs removal?

1) Without using mex, you have no control over whether matrix removal uses the same memory or not. However, you can tell if it happens or not. One approach would be to write something in mex. Alternatively you can activate format debug

N = 100000;
test_m = zeros(3, N);
t = evalc('disp(test_m)');
disp(t(1:100))
test_m(:,2:N-1) = [];
disp(t(1:100))

This yields the output

Structure address = 1259f6da0
m = 3
n = 100000
pr = 15d6d2020
pi = 0
  Columns 1 through 9

     0 

Structure address = 1259f6b70
m = 3
n = 2
pr = 608001c95320
pi = 0
     0     0
     0     0
     0

     0 

Note that to keep the display reasonable I capture the output of displaying the variable and then only show part of it. This output, particularly pr (pointer to real data) indicates that reallocation occurred. I wasn't able to find any situation where reallocation didn't occur.

2) As mentioned in some of the comments and alluded to in the question, memory is stored as column-major. Thus when you delete a column it might be more efficient than deleting a row ...

3) I'm not entirely sure if the code examples are realistic but it makes a lot more sense to remove all columns or rows in one go. This occurs really quickly.

N = 100000;
test_m = zeros(3, N);
test_m(:,2:N-1) = [];

4) Finally, I'm not sure what your wording refers to with preallocating for modifications vs removal. Big picture it is probably best to avoid removing rows or columns in a loop. Instead hold onto an array that indicates which columns or rows should be removed and then do it in one pass.