0
votes

Is there a function in MATLAB that lets me find the first element of string cell array A that also belongs to string cell array B?

I'm currently using:

    i = find(ismember(A,B));
    string = A{i(1)};

But I'd like to know if there's a function that doesn't calculate ismember until the last element of A but rather stop when finding the first match. The reason is A contains around 1,800,000 strings and I'm only interested in finding the first match.

Would a for loop be faster if I did:

    for j=1:length(A)
      if ismember(A{j}, B)
        string = A{j};
        break
      end
    end

??

Does the number of elements in A even influence the time required for calculating ismember?

Thank you.

1
I don't know if there's such a function, and I don't get why you can't do the timing tests yourself with tic/toc. Anyway, if A contains more elements than B, maybe ismember(B,A) will be faster (only when there are matches of course) ?Ghislain Bugnicourt
I can do the timing tests myself, I'm just asking in case someone already knows. Why would it be faster to do ismember(B,A)? I don't know because I ignore exactly what happens inside ismember.Thank you!ACenTe25
Also, if numerous strings are repeated in A, maybe [C,ia,ic] = unique(A) would be useful ? Not sure if it's faster though. :/Ghislain Bugnicourt
A is already "unique"ACenTe25
About ismember(B,A) I don't know for sure, but I guess the search for a value stops when it is met. Let's say B has two values only, with one present in the middle of A. I guess ismember(A,B) would need 1,800,000*2 searching steps, while ismember(B,A) would only require 1,800,000*1.5 steps. To sum up, if B has multiple values in A, it may go faster.Ghislain Bugnicourt

1 Answers

1
votes

There are some optional arguments to find that allow you to get only the first N results. I haven't verified that this causes short-circuit evaluation; it depends on whether MATLAB's JIT compiler is reordering operations to do ismember as-needed.

i = find(ismember(A,B), 1, 'first');

From the documentation:

nd = find(X, k) or ind = find(X, k, 'first') returns at most the first k indices corresponding to the nonzero entries of X. k must be a positive integer, but it can be of any numeric data type.

Your current workaround looks both straightforward and guaranteed to have the desired complexity.