I need to calculate the Euclidean Distance between all points that is stored in csr sparse matrix and some lists of points. It would be easier for me to convert the csr to a dense one, but I couldn't due to the lack of memory, so I need to keep it as csr.
So for example I have this data_csr sparse matrix (view in both, csr and dense):
data_csr
(0, 2) 4
(1, 0) 1
(1, 4) 2
(2, 0) 2
(2, 3) 1
(3, 5) 1
(4, 0) 4
(4, 2) 3
(4, 3) 2
data_csr.todense()
[[0, 0, 4, 0, 0, 0]
[1, 0, 0, 0, 2, 0]
[2, 0, 0, 1, 0, 0]
[0, 0, 0, 0, 0, 1]
[4, 0, 3, 2, 0, 0]]
and this center lists of points:
center
array([[0, 1, 2, 2, 4, 1],
[3, 4, 1, 2, 4, 0]])
using the scipy.spatial
package, the Euclidean Distance array between data_csr and center will be like the one below. So each point, of total 6 points, in each row of center was calculated against all rows in data_csr. The first row of the result array(2,5) is the ED between the first row of center and all rows in data_csr.
scipy.spatial.distance.cdist(center, data_csr, 'euclidean')
array([[ 5.09901951, 3.87298335, 5.19615242, 5. , 5.91607978],
[ 7.34846923, 5.38516481, 5.91607978, 6.8556546 , 6.08276253]])
What I've learned so far that I can get the nonzero values as well the indices with:
data_csr.data
array([4, 1, 2, 2, 1, 1, 4, 3, 2])
data_csr.indices
array([2, 0, 4, 0, 3, 5, 0, 2, 3])
But still I can't figure out how to calculate the ED between these two objects.
scipy.spatial.distance.cdist(center, data_csr, 'euclidean')
- Rochana Nana