Alternative to using squareform (Matlab)

Question

At the moment i am using the pdist function in Matlab, to calculate the euclidian distances between various points in a three dimensional cartesian system. I'm doing this because i want to know which point has the smallest average distance to all the other points (the medoid). The syntax for pdist looks like this:

% calculate distances between all points
distances = pdist(m);

But because pdist returns a one dimensional array of distances, there is no easy way to figure out which point has the smallest average distance (directly). Which is why i am using squareform and then calculating the smallest average distance, like so:

% convert found distances to matrix of distances
distanceMatrix = squareform(distances);

% find index of point with smallest average distance
[~,j] = min(mean(distanceMatrix,2));

The distances are averaged for each column, and the variable j is the index for the column (and the point) with the smallest average distance.

This works, but squareform takes a lot of time (this piece of code is repeated thousands of times), so i am looking for a way to optimise it. Does anyone know of a faster way to deduce the point with the smallest average distance from the results of pdist?

Do you have the same number of points for each call of your mediod function? — reve_etrange
Also, do you need the pairwise distances themselves? Or just the identity of the mediod? — reve_etrange
@reve_etrange, the number of points is different from time to time. And in the end all i need is the coordinates of the medoid (so i dont actually need the distances themselves, but use them to calculate the medoid). — user1218247
I experimented with precomputation of the indices. I think it can be faster if there are always the same number of points. Otherwise @yuk's answer is fastest AFAIK. I have nearly identical code in some of my functions. — reve_etrange

yuk yuk · Accepted Answer · 2012-03-12T05:41:32

I think for your task using SQUAREFORM function is the best way from vectorization view point. If you look at the content of this function by

edit squareform

You will see that it performs a lot of checks that take time of course. Since you know your input to squareform and can be sure it will work, you can create your custom function with just the core of squareform.

[r, c] = size(m);
distanceMatrix = zeros(r);
distanceMatrix(tril(true(r),-1)) = distances;
distanceMatrix = distanceMatrix + distanceMatrix';

Then run the same code as you did to find the medioid.

Alternative to using squareform (Matlab)

4 Answers