Given a quadratic matrix of dimension 1 million I want to calculate the diagonal degree matrix.
The diagonal degree matrix is defined as a diagonal matrix, which has the count of non zero values per row as entrys.
The matrix, let's call it A
is in format scipy.sparse.csr_matrix
.
If my machine would have enough power I would just do
diagonal_degrees = []
for row in A:
diagonal_degrees.append(numpy.sum(row!=0))
I even tried that, but it results in a
ValueError: array is too big.
So I tried to make use of the sparse structure of scipy. I thought of this way:
diagonal_degrees = []
CSC_format = A.tocsc() # A is in scipys CSR format.
for i in range(CSC_format.shape[0]):
row = CSC_format.getrow(i)
diagonal_degrees.append(numpy.sum(row!=0))
I have two questions:
- Is there a more efficient way, I maybe have overlooked?
- While the docs of scipy sparse state:
All conversions among the CSR, CSC, and COO formats are efficient, linear-time operations.
Why do I get a
SparseEfficiencyWarning: changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
while changing from CSR to CSC?
csr_matrix
. "Changing the sparsity structure" has nothing to do with converting between different sparse matrix formats. It's when you add a "dense" item(s). – Joe Kingtonnonzero
method looks promising. – Avarisdiag_deg, _ = np.histogram(x.nonzero()[0], np.arange(x.shape[0]+1))
– Joe Kingtonnonzero
. – Joe Kington