You could also do this completely vectorized by using mat2cell
, cellfun
, then cell2mat
. Assuming our matrix is stored in A
, try:
numBlocks = size(A,1) / 3;
B = mat2cell(A, 3*ones(1,numBlocks), 3);
C = cellfun(@(x) x - x([1 1 1], :), B, 'UniformOutput', false);
D = cell2mat(C); %//Output
The first line figures out how many 3 x 3 blocks we need. This is assuming that the number of rows is a multiple of 3. The second line uses mat2cell
to decompose each 3 x 3 block and places them into individual cells. The third line then uses cellfun
so that for each cell in our cell array (which is a 3 x 3 matrix), it takes each row of the 3 x 3 matrix and subtracts itself with the first row. This is very much like what @David did, except I didn't use repmat
to minimize overhead. The fourth line then takes each of these matrices and stacks them back so that we get our final matrix in the end.
Example (this is using the matrix that was defined in your post):
A = [1 3 5; 2 3 6; 1 1 1; 3 5 4; 5 5 5; 8 8 0];
numBlocks = size(A,1) / 3;
B = mat2cell(A, 3*ones(1, numBlocks), 3);
C = cellfun(@(x) x - x([1 1 1], :), B, 'UniformOutput', false);
D = cell2mat(C);
Output:
D =
0 0 0
1 0 1
0 -2 -4
0 0 0
2 0 1
5 3 -4
In hindsight, I think @David is right with respect to performance gains. Unless this code is repeated many times, I think the for
loop will be more efficient. Either way, I wanted to provide another alternative. Cool exercise!
Edit: Timing and Size Tests
Because of our discussion earlier, I have decided to do timing and size tests. These tests were performed on an Intel i7-4770 @ 3.40 GHz CPU with 16 GB of RAM, using MATLAB R2014a on Windows 7 Ultimate. Basically, I did the following:
Test #1 - Set the random seed generator to 1 for reproducibility. I wrote a loop that cycled 10000 times. For each iteration in the loop, I generated a random integer 3000 x 3 matrix, then performed each of the methods that were described here. I took note of how long it took for each method to complete after 10000 cycles. The timing results are:
- David's method:
0.092129 seconds
- rayryeng's method:
1.9828 seconds
- natan's method:
0.20097 seconds
- natan's
bsxfun
method: 0.10972 seconds
- Divakar's
bsxfun
method: 0.0689 seconds
As such, Divakar's method is the fastest, followed by David's for
loop method, followed closely by natan's bsxfun
method, followed by natan's original kron
method, followed by the sloth (a.k.a mine).
- Test #2 - I decided to see how fast this would get as you increase the size of the matrix. The set up was as follows. I did 1000 iterations, and at each iteration, I increase the size of the matrix rows by 3000 each time. As such, iteration 1 consisted of a 3000 x 3 matrix, the next iteration consisted of a 6000 x 3 matrix and so on. The random seed was set to 1 again. At each iteration, the time taken to complete the code was taken a note of. To ensure fairness, the variables were cleared at each iteration before the processing code began. As such, here is a
stem
plot that shows you the timing for each size of matrix. I subsetted the plot so that it displays timings from 200000 x 3 to 300000 x 3. Take note that the horizontal axis records the number of rows at each iteration. The first stem is for 3000 rows, the next is for 6000 rows and so on. The columns remain the same at 3 (of course).
I can't explain the random spikes throughout the graph.... probably attributed to something happening in RAM. However, I'm very sure I'm clearing the variables at each iteration to ensure no bias. In any case, Divakar and David are closely tied. Next comes natan's bsxfun
method, then natan's kron
method, followed last by mine. Interesting to see how Divakar's bsxfun
method and David's for
method are side-by-side in timing.
- Test #3 - I repeated what I did for Test #2, but using natan's suggestion, I decided to go on a logarithmic scale. I did 6 iterations, starting at a 3000 x 3 matrix, and increasing the rows by 10 fold after. As such, the second iteration had 30000 x 3, the third iteration had 300000 x 3 and so on, up until the last iteration, which is 3e8 x 3.
I have plotted on a semi-logarithmic scale on the horizontal axis, while the vertical axis is still a linear scale. Again, the horizontal axis describes the number of rows in the matrix.
I changed the vertical limits so we can see most of the methods. My method is so poor performing that it would squash the other timings towards the lower end of the graph. As such, I changed the viewing limits to take my method out of the picture. Essentially what was seen in Test #2 is verified here.