I have two matrices. One is of size 1,000,000 x 9 and the other is 500,000 x 9.
The columns have the same meaning and the first 7 columns have the function of a key. Correspondingly, the last two columns have data character. There are many overlapping key values in both of the matrices and I would like to have a big matrix to compare the values. This big matrix should be of dimension 1,000,000 x 11.
For example:
A = [0 0 0 0 0 0 0 10 20; 0 0 0 0 0 0 1 30 40];
B = [0 0 0 0 0 0 0 50 60];
A merged matrix would look like this:
C = [0 0 0 0 0 0 0 10 20 50 60; 0 0 0 0 0 0 1 30 40 0 0];
As you can see, the first row of C has columns 8, 9 from matrix A and columns 10,11 from matrix B. The second row uses the columns 8, 9 from matrix A and 0,0 for the last to columns because there is no corresponding entry in matrix B.
I have accomplished this task theoretically, but it is very, very slow. I use loops a lot. In any other programming language, I would sort both tables, would iterate both of the tables in one big loop keeping two pointers.
Is there a more efficient algorithm available in Matlab using vectorization or at least a sufficiently efficient one that is idiomatic/short?
(Additional note: My largest issue seems to be the search function: Given my matrix, I would like to throw in one column vector 7x1, let's name it key
to find the corresponding row. Right now, I use bsxfun for that:
targetRow = data( min(bsxfun(@eq, data(:, 1:7), key), [], 2) == 1, :);
I use min
because the result of bsxfun is a vector with 7 match flags and I obviously want all of them to be true. It seems to me that this could be bottleneck of a Matlab algorithm)
all(X,2)
overmin(X,[],2)==1
for a logical array. Not sure if it is any faster though. – Florianprofiler
to display which lines / sections of your code are using the most time. uk.mathworks.com/help/matlab/ref/profile.html Namely,profile on; <yourCodeHere>; profile viewer;
– Wolfie