3
votes

I am trying to match 1st column of A with 1st to 3rd columns of B, and append corresponding 4th column of B to A.

For example,

A=
    1 2 
    3 4

B=
    1 2 4 5 4
    1 2 3 5 3
    1 1 1 1 2
    3 4 5 6 5

I compare A(:,1) and B(:, 1:3)

1 and 3 are in A(:,1)

1 is in the 1st, 2nd, 3rd rows of B(:, 1:3), so append B([1 2 3], 4:end)' to A's 1st row. 3 is in the 2nd and 4th rows of B(:,1:3), so append B([2 4], 4:end)' to A's 2nd row.

So that it becomes:

1 2 5 4 5 3 1 2
3 4 5 3 6 5 0 0

I could code this using only for and if.

clearvars AA A B mem mem2 mem3

A = [1 2 ; 3 4]
B = [1 2 4 5 4; 1 2 3 5 3; 1 1 1 1 2; 3 4 5 6 5]

for n=1:1:size(A,1)
    mem  = ismember(B(:,[1:3]), A(n,1));
    mem2 = mem(:,1) + mem(:,2) + mem(:,3);
    mem3 = find(mem2>0);

    AA{n,:} = horzcat( A(n,:), reshape(B(mem3,[4,5])',1,[]) );  %'
end

maxLength = max(cellfun(@(x)numel(x),AA));
out = cell2mat(cellfun(@(x)cat(2,x,zeros(1,maxLength-length(x))),AA,'UniformOutput',false))

I am trying to make this code efficient, by not using for and if, but couldn't find an answer.

2
Can there be zeros in A or B?Divakar
in your definition of AA (last line inside loop) you should use 4:end instead of [4,5]. ANd your code runs quite fast/efficient. Would recommend to keep it, if no faster solution is found... there is no reason to avoid loops just that many times there is a faster solution without loops.The Minion
@TheMinion: there is the problem that his loop body contains ismember, which means JIT cannot accelerate this loop effectively. For larger problems, this will becomes a concern.Rody Oldenhuis
@RodyOldenhuis True. Hence the problem isn't the for-loop but the ismember() inside the loop. Still when I ran his code and the one from Nishant, his was minimal faster even for 10.000x100 entries. So not sure if that "problem" with ismember() really results in such runtime issues. BTW nice solution +1The Minion

2 Answers

1
votes

Try this

a = A(:,1);
b = B(:,1:3);
z = size(b);
b = repmat(b,[1,1,numel(a)]);
ab = repmat(permute(a,[2,3,1]),z);
row2 = mat2cell(permute(sum(ab==b,2),[3,1,2]),ones(1,numel(a)));
AA = cellfun(@(x)(reshape(B(x>0,4:end)',1,numel(B(x>0,4:end)))),row2,'UniformOutput',0);
maxLength = max(cellfun(@(x)numel(x),AA));
out = cat(2,A,cell2mat(cellfun(@(x)cat(2,x,zeros(1,maxLength-length(x))),AA,'UniformOutput',false)))

UPDATE Below code runs in almost same time as the iterative code

a = A(:,1);
b = B(:,1:3);
z = size(b);
b = repmat(b,[1,1,numel(a)]);
ab = repmat(permute(a,[2,3,1]),z);
df = permute(sum(ab==b,2),[3,1,2])';
AA = arrayfun(@(x)(B(df(:,x)>0,4:end)),1:size(df,2),'UniformOutput',0);
AA = arrayfun(@(x)(reshape(AA{1,x}',1,numel(AA{1,x}))),1:size(AA,2),'UniformOutput',0);    
maxLength = max(arrayfun(@(x)(numel(AA{1,x})),1:size(AA,2)));
out2 = cell2mat(arrayfun(@(x,i)((cat(2,A(i,:),AA{1,x},zeros(1,maxLength-length(AA{1,x}))))),1:numel(AA),1:size(A,1),'UniformOutput',0));
1
votes

How about this:

%# example data
A = [1 2
     3 4];

B = [1 2 4 5 4
     1 2 3 5 3
     1 1 1 1 2
     3 4 5 6 5];

%# rename for clarity & reshape for algorithm's convenience
needle   = permute(A(:,1), [2 3 1]);
haystack = B(:,1:3);
data     = B(:,4:end).';

%# Get the relevant rows of 'haystack' for each entry in 'needle'
inds = any(bsxfun(@eq, haystack, needle), 2);

%# Create data that should be appended to A
%# All data and functionality in this loop is local and static, so speed 
%# should be optimal.
append = zeros( size(A,1), numel(data) );
for ii = 1:size(inds,3)    
    newrow = data(:,inds(:,:,ii));
    append(ii,1:numel(newrow)) = newrow(:);    
end

%# Now append to A, stripping unneeded zeros
A = [A append(:, ~all(append==0,1))]