0
votes

I have a sparse CSC matrix, "A", in which I want to replace the first row with a vector that is all zeros, except for the first entry which is 1.

So far I am doing the inefficient version, e.g.:

import numpy as np
from scipy.sparse import csc_matrix

row = np.array([0, 2, 2, 0, 1, 2])
col = np.array([0, 0, 1, 2, 2, 2])
data = np.array([1, 2, 3, 4, 5, 6])
A = csc_matrix((data, (row, col)), shape=(3, 3))
replace = np.zeros(3)
replace[0] = 1 
A[0,:] = replace
A.eliminate_zeros()

But I'd like to do it with .indptr, .data, etc. As it is a CSC, I am guessing that this might be inefficient as well? In my exact problem, the matrix is 66000 X 66000.

For a CSR sparse matrix I've seen it done as

A.data[1:A.indptr[1]] = 0
A.data[0] = 1.0
A.indices[0] = 0
A.eliminate_zeros()

So, basically I'd like to do the same for a CSC sparse matrix.

Expected result: To do exactly the same as above, just more efficiently (applicable to very large sparse matrices).

That is, start with:

[1, 0, 4],
[0, 0, 5],
[2, 3, 6]

and replace the upper row with a vector that is as long as the matrix, is all zeros except for 1 at the beginning. As such, one should end with

[1, 0, 0],
[0, 0, 5],
[2, 3, 6]

And be able to do it for large sparse CSC matrices efficiently.

Thanks in advance :-)

1
I'm not sure that working directly with the indptr will be faster than using A[0,:]=. Creating the dense replace efficient enough. Something to keep in mind is that csc indexing does not remove the new 0's from the sparsity structure. That requires a separate 'remove zeros' method call.hpaulj
Even in the case where the dimension is much larger? I.e. 66000 x 66000user469216

1 Answers

0
votes

You can do it by indptr and indices. If you want to construct your matrix with indptr and indices parameters by:

indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
A = csc_matrix((data, indices, indptr), shape=(3,3))

But if you want to set all elements in the first row except the first element in row 0, you need to set data values to zero for those that indices is zero. In other words:

data[indices == 0] = 0

The above line set all the elements of the first row to 0. To avoid setting the first element to zero we can do the following:

indices_tmp = indices == 0
indices_tmp[0] = False    # to avoid removing the first element in row 0.
data[indices_tmp == True] = 0
A = csc_matrix((data, indices, indptr), shape=(3,3))

Hope it helps.