1
votes

I have sparse CSR matrices (from a product of two sparse vector) and I want to convert each matrix to a flat vector. Indeed, I want to avoid using any dense representation or iterating over indexes.

So far, the only solution that came up was to iterate over non null elements by using coo representation:

import numpy
from scipy import sparse as sp
matrices = [sp.csr_matrix([[1,2],[3,4]])]*3
vectorSize = matrices[0].shape[0]*matrices[0].shape[1]
flatMatrixData = []
flatMatrixRows = []
flatMatrixCols = []
for i in range(len(matrices)):
    matrix = matrices[i].tocoo()
    flatMatrixData += matrix.data.tolist()
    flatMatrixRows += [i]*matrix.nnz
    flatMatrixCols += [r+c*2 for r,c in zip(matrix.row, matrix.col)]
flatMatrix = sp.coo_matrix((flatMatrixData,(flatMatrixRows, flatMatrixCols)), shape=(len(matrices), vectorSize), dtype=numpy.float64).tocsr()

It is indeed unsatisfying and inelegant. Does any one know how to achieve this in an efficient way?

1
Your flatMatrix is (3,4); each row is [1 3 2 4]. If a submatrix is x, then the row is x.A.T.flatten().hpaulj

1 Answers

2
votes

Your flatMatrix is (3,4); each row is [1 3 2 4]. If a submatrix is x, then the row is x.A.T.flatten().

F = sp.vstack([x.T.tolil().reshape((1,vectorSize)) for x in matrices])

F is the same (dtype is int). I had to convert each submatrix to lil since csr has not implemented reshape (in my version of sparse). I don't know if other formats work.

Ideally sparse would let you do the whole range of numpy array (or matrix) manipulations, but it isn't there yet.

Given the small dimensions in this example, I won't speculate on the speed of the alternatives.