Find n random zero element in a scipy sparse csr_matrix

Question

I want to find n zero elements in a sparse matrix. I write the code below:

counter = 0
while counter < n:
    r = randint(0, W.shape[0]-1)
    c = randint(0, W.shape[1]-1)
    if W[r,c] == 0:
        result.append([r,c])
        counter += 1

Unfortunately, it is very slow. I want something more efficient. Is there any way to access zero elements from scipy sparse matrix quickly?

Is the matrix very sparse, or quite dense? Is n small or large compared to the number of zero elements in W? Different methods will be fastest depending on which regime we are in. — unutbu
@unutbu It is very sparse(density = 0.02). and I want to select about one percent of the zero elements. Also, the dimension of the array is high(1000 * 50000) — zahra
Could you instead simply generate a random matrix with density 0.01, and add that to this? It would be faster. Though there will be some overlaps, but the 'randomness' might be just as good, if not better. — hpaulj
Some in equality tests on a sparse matrix can produce matrix that is mostly True's - if all the 0's satisfy the test. the result is a very unsparse sparse matrix. — hpaulj
Keep in mind that in a sparse matrix, the zeros are defined by what isn't there. It has a record of the nonzeros, which is much smaller. — hpaulj

fountainhead fountainhead · Accepted Answer · 2019-03-16T16:39:31

First, here's some code to create some sample data:

import numpy as np
rows, cols = 10,20   # Shape of W
nonzeros = 7         # How many nonzeros exist in W
zeros = 70           # How many zeros we want to randomly select

W = np.zeros((rows,cols), dtype=int)
nonzero_rows = np.random.randint(0, rows, size=(nonzeros,))
nonzero_cols = np.random.randint(0, cols, size=(nonzeros,))
W[nonzero_rows, nonzero_cols] = 20

The above code has created W as a sparse numpy array, having shape (10,20), and having only 7 non-zero elements (out of the 200 elements). All the non-zero elements have a value 20.

Here's the solution to pick zeros=70 zero elements from this sparse matrix:

argwhere_res = np.argwhere(np.logical_not(W))
zero_count = len(argwhere_res)
ids = np.random.choice(range(zero_count), size=(zeros,))
res = argwhere_res[ids]

res would now be a shape (70,2) array giving the locations of the 70 elements that we have randomly chosen from W.

Note that this does not involve any loops.

Find n random zero element in a scipy sparse csr_matrix

3 Answers