3
votes

New to python and programming in general:

The documentation to squareform states the following:

Converts a vector-form distance vector to a square-form distance matrix, and vice-versa.

Converts a 1D array into a squared matrix?

Where the paramenter X:

Either a condensed or redundant distance matrix.

and returns:

If a condensed distance matrix is passed, a redundant one is returned, or if a redundant one is passed, a condensed distance matrix is returned.

  1. what is the difference between condensed and redundant matrices?
  2. what is the relationship between condensed/redundant matrix and vector/square form in which it takes?

The return of pdist papers to return condensed distance matrix:

Returns a condensed distance matrix Y. For each i and j (where i is less than j is less than n), the metric dist(u=X[i], v=X[j]) is computed and stored in entry ij.

Am I right in thinking that in each element Y stores the distance between a particular point and an other point? An example with 3 observations would mean a condensed matrix with 9 elements?

1
Will, does stackoverflow.com/questions/13079563/… look like a duplicate of your question? - Warren Weckesser
@WarrenWeckesser related but different, stackoverflow.com/questions/13079563/ it take the terms I question for granted and so begs the question? Unless I am missing something. - user6204921
When we say: "If y is a 1d condensed distance matrix, then y must be a (n 2) sized vector where n is the number of original observations paired in the distance matrix." what does (n 2) means? - akshit bhatia

1 Answers

1
votes

If you have a nxn matrix then each pairwise combination from the set N exists twice, once in each order, ab and ba. So if you create a distance matrix from a set of N points you can condense the data by only storing each point once, and neglecting any comparisons between points and themselves.

for example if we have the points a, b, and c we would have the distance matrix

    a    b    c
a   0    ab   ac
b   ba   0    bc
c   ca   cb   0

and the condensed distance matrix,

    a    b    c
         ab   ac
              bc

Because distance masers are unsigned the condensed table retains all the information.