My data exists in 128 dimensions, I'trying to reduce my data to 3 dimensions to visualize my data and preserve the Euclidean distance. Then distance represent the similarity between two data points.
Original data X: 5 * 128 (5 data points)
[[ -4.46e-02 1.57e-01 2.17e-01 1.24e-01 6.01e-02 7.61e-02
6.38e-02 -1.05e-01 -2.55e-02 5.99e-02 -8.38e-02 5.93e-02
-1.58e-01 -1.05e-01 1.31e-01 -5.33e-02 -4.18e-02 9.32e-02
-1.62e-02 -9.19e-02 -1.30e-01 8.56e-02 -6.13e-02 3.78e-02
7.84e-02 -9.74e-02 -9.42e-02 7.47e-02 -4.65e-02 7.36e-03
-9.19e-04 1.37e-01 -8.52e-02 9.27e-02 6.50e-02 -2.61e-02
7.21e-02 -1.83e-01 -2.49e-02 -9.85e-03 1.57e-01 -7.98e-02
1.50e-01 -1.40e-01 -2.39e-02 4.19e-02 6.98e-02 -1.27e-02
-7.56e-02 4.44e-02 1.86e-01 -2.22e-03 -1.79e-02 -3.90e-02
7.72e-02 4.47e-02 -8.15e-02 -4.31e-02 -6.52e-03 7.73e-02
-1.37e-02 5.78e-02 -1.25e-01 -1.58e-01 1.37e-01 9.34e-02
-6.07e-03 -1.69e-01 -2.12e-01 2.14e-01 -4.05e-02 1.29e-01
4.42e-02 1.71e-01 -2.13e-02 8.00e-03 7.17e-02 4.57e-03
-6.55e-03 -1.66e-01 3.73e-02 1.01e-01 -1.26e-03 1.96e-02
5.44e-02 -1.04e-01 -5.32e-02 -1.57e-02 -6.31e-02 1.89e-01
2.43e-02 1.59e-02 9.13e-03 -4.41e-02 -5.96e-03 1.03e-01
4.33e-02 -3.94e-02 7.85e-02 3.61e-02 -2.32e-02 3.69e-03
-9.57e-03 -1.47e-02 2.61e-02 -4.15e-04 1.41e-02 -4.22e-02
-7.42e-02 1.07e-01 9.08e-03 3.45e-02 6.41e-02 -5.37e-02
1.57e-02 -1.91e-01 8.21e-02 3.31e-02 3.57e-02 1.37e-02
1.56e-01 6.25e-02 4.54e-02 -1.07e-02 1.08e-01 2.69e-02
9.57e-02 -1.24e-01]
...
]
Original distance matrix dist:
dist = DataArray(squareform(pdist(X, 'euclidean')))
[[ 0. , 0.67, 0.62, 0.7 , 0.67],
[ 0.67, 0. , 0.48, 0.76, 0.46],
[ 0.62, 0.48, 0. , 0.7 , 0.48],
[ 0.7 , 0.76, 0.7 , 0. , 0.6 ],
[ 0.67, 0.46, 0.48, 0.6 , 0. ]]
T-SNE:
from sklearn.manifold import TSNE
model = TSNE(n_components=3, random_state=0)
x_tsne = model.fit_transform(x)
x_tsne:
[[ 1.78e-04 4.02e-05 1.01e-04]
[ 2.25e-04 1.90e-04 -1.00e-04]
[ 9.43e-05 -1.72e-05 -1.21e-05]
[ 4.02e-05 1.36e-05 1.49e-04]
[ 7.44e-05 1.08e-05 4.45e-05]]
dist_tsne:
[[ 0.00e+00, 2.55e-04, 1.52e-04, 1.49e-04, 1.22e-04],
[ 2.55e-04, 0.00e+00, 2.60e-04, 3.57e-04, 2.75e-04],
[ 1.52e-04, 2.60e-04, 0.00e+00, 1.72e-04, 6.62e-05],
[ 1.49e-04, 3.57e-04, 1.72e-04, 0.00e+00, 1.10e-04],
[ 1.22e-04, 2.75e-04, 6.62e-05, 1.10e-04, 0.00e+00]]
I compares dist and dist_tsne, I noticed that the values are not same, and they are not even proportional. How can I preserve the Euclidean distance while reduce the dimension?