3
votes

Given a quadratic matrix of dimension 1 million I want to calculate the diagonal degree matrix.

The diagonal degree matrix is defined as a diagonal matrix, which has the count of non zero values per row as entrys.

The matrix, let's call it A is in format scipy.sparse.csr_matrix.

If my machine would have enough power I would just do

diagonal_degrees = []
for row in A:
    diagonal_degrees.append(numpy.sum(row!=0))

I even tried that, but it results in a

ValueError: array is too big.

So I tried to make use of the sparse structure of scipy. I thought of this way:

diagonal_degrees = []
CSC_format = A.tocsc() # A is in scipys CSR format.
for i in range(CSC_format.shape[0]):
    row = CSC_format.getrow(i)
    diagonal_degrees.append(numpy.sum(row!=0))

I have two questions:

  1. Is there a more efficient way, I maybe have overlooked?
  2. While the docs of scipy sparse state:

All conversions among the CSR, CSC, and COO formats are efficient, linear-time operations.

Why do I get a

SparseEfficiencyWarning: changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.

while changing from CSR to CSC?

1
You're getting the error when you set an item in a csr_matrix. "Changing the sparsity structure" has nothing to do with converting between different sparse matrix formats. It's when you add a "dense" item(s).Joe Kington
If all you need to do is count the non-zero elements, nonzero method looks promising.Avaris
As @avaris already pointed you to, you can just do diag_deg, _ = np.histogram(x.nonzero()[0], np.arange(x.shape[0]+1))Joe Kington
@Joe: Please turn this into an answer, so I can vote and accept.Aufwind
It should really be @Avaris's, in my opinion, as he was the one to point out nonzero.Joe Kington

1 Answers

4
votes

If all you need is to count the non-zero elements, there is nonzero method that could be useful.

Exact code would be (with the help of Joe Kington and matehat):

diag_deg, _ = np.histogram(x.nonzero()[0], np.arange(x.shape[0]+1))

# generating a diagonal matrix with diag_deg
dim = x.shape[0]
diag_mat = np.zeros((dim**2, ))
diag_mat[np.arange(0, dim**2, dim+1)] = diag_deg
diag_mat.reshape((dim, dim))

Though for large arrays (dim ~ 1 million), as noted by Aufwind, np.zeros((dim**2, )) gives the exception: ValueError: Maximum allowed dimension exceeded. An alternative workaround is to use sparse matrices:

diag_mat = sparse.coo_matrix((dim, dim))
diag_mat.setdiag(diag_deg)