3
votes

Lets say I have two vectors A and B with different lengths Length(A) is not equal to Length(B) and the Values in Vector A, are not the same as in Vector B. I want to compare each value of B with Values of A (Compare means if Value B(i) is almost the same value of A(1:end) for example B(i)-Tolerance<A(i)<B(i)+Tolerance.

How Can I do this without using for loop since the data is huge?

I know ismember(F), intersect,repmat,find but non of those function can really help me

5
So you're only comparing A(i) with B(i)? Why not post the existing for loop code and people might be able to suggest improvement from there. - weston
Here is a solution for ismember with a tolerance. It is about twice as slow as the solution posted by @ondav but does handle the tolerance more accurately. mathworks.com/matlabcentral/fileexchange/23294-ismemberf/… - Dennis Jaheruddin

5 Answers

3
votes

You may try a solution along these lines:

tol = 0.1; 

N = 1000000; 

a = randn(1, N)*1000; % create a randomly

b = a + tol*rand(1, N); % b is "tol" away from a

a_bin = floor(a/tol); 
b_bin = floor(b/tol); 

result = ismember(b_bin, a_bin) | ...
         ismember(b_bin, a_bin-1) | ...
         ismember(b_bin, a_bin+1); 

find(result==0) % should be empty matrix. 

The idea is to discretize the a and b variables to bins of size tol. Then, you ask whether b is found in the same bin as any element from a, or in the bin to the left of it, or in the bin to the right of it.

Advantages: I believe ismember is clever inside, first sorting the elements of a and then performing sublinear (log(N)) search per element b. This is unlike approaches which explicitly construct differences of each element in b with elements from a, meaning the complexity is linear in the number of elements in a.

Comparison: for N=100000 this runs 0.04s on my machine, compared to 20s using linear search (timed using Alan's nice and concise tf = arrayfun(@(bi) any(abs(a - bi) < tol), b); solution).

Disadvantages: this leads to that the actual tolerance is anything between tol and 1.5*tol. Depends on your task whether you can live with that (if the only concern is floating point comparison, you can).

Note: whether this is a viable approach depends on the ranges of a and b, and value of tol. If a and b can be very big and tol is very small, the a_bin and b_bin will not be able to resolve individual bins (then you would have to work with integral types, again checking carefully that their ranges suffice). The solution with loops is a safer one, but if you really need speed, you can invest into optimizing the presented idea. Another option, of course, would be to write a mex extension.

2
votes

It sounds like what you are trying to do is have an ismember function for use on real valued data.

That is, check for each value B(i) in your vector B whether B(i) is within the tolerance threshold T of at least one value in your vector A

This works out something like the following:

tf = false(1, length(b)); %//the result vector, true if that element of b is in a
t = 0.01; %// the tolerance threshold
for i = 1:length(b)
    %// is the absolute difference between the 
    %//element of a and b less that the threshold?
    matches = abs(a - b(i)) < t; 

    %// if b(i) matches any of the elements of a
    tf(i) = any(matches);
end

Or, in short:

t = 0.01;
tf = arrayfun(@(bi) any(abs(a - bi) < t), b);

Regarding avoiding the for loop: while this might benefit from vectorization, you may also want to consider looking at parallelisation if your data is that huge. In that case having a for loop as in my first example can be handy since you can easily do a basic version of parallel processing by changing the for to parfor.

1
votes

Here is a fully vectorized solution. Note that I would actually recommend the solution given by @Alan, as mine is not likely to work for big datasets.

[X Y]=meshgrid(A,B)
M=abs(X-Y)<tolerance 

Now the logical index of elements in a that are within the tolerance can be obtained with any(M) and the index for B is found by any(M,2)

1
votes

bsxfun to the rescue

 >> M = abs( bsxfun(@minus, A, B' ) ); %//' difference
 >> M < tolerance 
0
votes

Another way to do what you want is with a logical expression.
Since A and B are vectors of different sizes you can't simply subtract and look for values that are smaller than the tolerance, but you can do the following:

Lmat = sparse((abs(repmat(A,[numel(B) 1])-repmat(B',[1 numel(A)])))<tolerance);

and you will get a sparse logical matrix with as many ones in it as equal elements (within tolerance). You could then count how many of those elements you have by writing:

Nequal = sum(sum(Lmat));

You could also get the indexes of the corresponding elements by writing:

[r,c] = find(Lmat);

then the following code will be true (for all j in numel(r)):

B(r(j))==A(c(j))

Finally, you should note that this way you get multiple counts in case there are duplicate entries in A or in B. It may be advisable to use the unique function first. For example:

A_new = unique(A);