1
votes

assume there are three group of high dimension vectors:

{a_1, a_2, ..., a_N},

{b_1, b_2, ... , b_N},

{c_1, c_2, ..., c_N}.

each of my vector can be represented as: x = a_i + b_j + c_k, where 1 <=i, j, k <= N. then the vector is encoded as (i, j, k) wich is then can be decoded as x = a_i + b_j + c_k.

my question is, if there are two vector: x = (i_1, j_1, k_1), y = (i_2, j_2, k_2), is there a method to compute the euclidian distance of these two vector without decode x and y.

3
Confused with your notation. What is the format for one vector?Yin Zhu
What do you mean by "without decode"?Carlos
the vector is in R^M space. M float number is needed to store a vector. if the vector is encoded to (i, j, k), only 3 int is need.chyojn
decode a vector is to restore the M float value from 3 int value: (i, j, k) ==> a_i + b_j + c_k. if i decode the vector, then it is easy to compute two vector's distance.chyojn
the three group vectors {a_1, a_2, ..., a_N}, {b_1, b_2, ... , b_N}, {c_1, c_2, ..., c_N}. can be treated as three "basic vectors".chyojn

3 Answers

3
votes

Square root of the sum of squares of the differences between components. There's no other way to do it.

You should scale the values to guard against overflow/underflow issues. Search for the max difference and divide all the components by it before squaring, summing, and taking the square root.

1
votes

Let's assume you have only two groups. You are trying to compute the scalar product

(a_i1 + b_j1, a_i2 + b_j2)
= (a_i1,a_i2) + (b_j1,b_j2) + (a_i1,b_j2) + (a_i2,b_j1) # <- elementary scalar products

So if you know the necessary elementary scalar products between the elements of your vectors a_i, b_j, c_k, then, you do not need to "decode" x and y and can compute the scalar product directly.

Note that this is exactly what happens when you compute an ordinary euclidian distance on a non orthogonal basis.

0
votes

If you are happy with an approximate result, you could project your high dimension basis vectors using a random projection into a small dimensional space. Johnson-Lindenstrauss lemma says that you can reduce your dimension to O(log N), so that distances remain approximately the same with high probability.