2
votes

The MATLAB function, pdist, computes pairwise distances for a set of points. In order to save space and time, it elides repeat ( dist(A,B) == dist(B,A) ) and self ( dist(A,A) == 0 ) comparisons. The result is returned in a single vector, with indices [1 2 3 4 5 6 7 8 9 10]. Expanding into a full distance matrix ( e.g. by the function squareform ), the positions are mapped like:

-  -  -  -  -
1  -  -  -  -    # "1" is subscript "1,2"
2  5  -  -  -    
3  6  8  -  -    # "6" is subscript "4,2"
4  7  9  10 -

Given a vector such as that produced by pdist, what is the expression to convert an index into that vector into the subscript indices of the full distance matrix?

For example, I wish to find, in a set of points, the two points most distant from each other:

d = pdist(points); %# points is M x 2 (or 3, etc.), d is described above
N = length(d);
[~,I] = max(d); %# I is the index, e.g. 6
CI = ceil( sqrt( 2*N ) ) - round( sqrt( 2*(1 + N - I) ) ); %# Identity of one point, e.g. 4 is I == 6
RI = ????  %# Identity of second point, e.g. 2, if I == 6 and CI == 4

As indicated above, I have the first part, to find the first subscript. I can't quite figure out the second one. There are a number of related questions (vide infra) but the answers aren't correct (or are specific to 0-index, row-major languages).

2
I'm not sure if I fully understand your question, would you be able to rephrase?gkiar
Is it sufficiently explained now?reve_etrange

2 Answers

5
votes

@reve_etrange I believe N should be the length of d, not the number of rows of your points matrix.

K=5; % Number of points
d = pdist(rand(K,3));
N = length(d);
m = ceil(sqrt(2*N));
I = 1:N;
CI = m - round( sqrt( 2*(1 + N - I) ) );
RI = mod(I + CI.*(CI+1)/2 - 1, m) + 1;

This results in

I =    1     2     3     4     5     6     7     8     9    10
RI =   2     3     4     5     3     4     5     4     5     5
CI =   1     1     1     1     2     2     2     3     3     4
0
votes

Some parts of the question are still nonsense, but I believe this is what you're trying to do...

Find the subscripts of the nonzero elements of a lower triangular matrix that is the same size as your distance matrix:

>> points = randn(5,2);
>> [i,j] = find(tril(ones(size(points, 1)), -1))

i =

     2
     3
     4
     5
     3
     4
     5
     4
     5
     5


j =

     1
     1
     1
     1
     2
     2
     2
     3
     3
     4