2
votes

I want to find n zero elements in a sparse matrix. I write the code below:

counter = 0
while counter < n:
    r = randint(0, W.shape[0]-1)
    c = randint(0, W.shape[1]-1)
    if W[r,c] == 0:
        result.append([r,c])
        counter += 1

Unfortunately, it is very slow. I want something more efficient. Is there any way to access zero elements from scipy sparse matrix quickly?

3
Is the matrix very sparse, or quite dense? Is n small or large compared to the number of zero elements in W? Different methods will be fastest depending on which regime we are in.unutbu
@unutbu It is very sparse(density = 0.02). and I want to select about one percent of the zero elements. Also, the dimension of the array is high(1000 * 50000)zahra
Could you instead simply generate a random matrix with density 0.01, and add that to this? It would be faster. Though there will be some overlaps, but the 'randomness' might be just as good, if not better.hpaulj
Some in equality tests on a sparse matrix can produce matrix that is mostly True's - if all the 0's satisfy the test. the result is a very unsparse sparse matrix.hpaulj
Keep in mind that in a sparse matrix, the zeros are defined by what isn't there. It has a record of the nonzeros, which is much smaller.hpaulj

3 Answers

1
votes

First, here's some code to create some sample data:

import numpy as np
rows, cols = 10,20   # Shape of W
nonzeros = 7         # How many nonzeros exist in W
zeros = 70           # How many zeros we want to randomly select

W = np.zeros((rows,cols), dtype=int)
nonzero_rows = np.random.randint(0, rows, size=(nonzeros,))
nonzero_cols = np.random.randint(0, cols, size=(nonzeros,))
W[nonzero_rows, nonzero_cols] = 20

The above code has created W as a sparse numpy array, having shape (10,20), and having only 7 non-zero elements (out of the 200 elements). All the non-zero elements have a value 20.

Here's the solution to pick zeros=70 zero elements from this sparse matrix:

argwhere_res = np.argwhere(np.logical_not(W))
zero_count = len(argwhere_res)
ids = np.random.choice(range(zero_count), size=(zeros,))
res = argwhere_res[ids]

res would now be a shape (70,2) array giving the locations of the 70 elements that we have randomly chosen from W.

Note that this does not involve any loops.

0
votes
import numpy as np
import scipy.sparse as sparse
import random
randint = random.randint

def orig(W, n):
    result = list()
    while len(result) < n:
        r = randint(0, W.shape[0]-1)
        c = randint(0, W.shape[1]-1)
        if W[r,c] == 0:
            result.append((r,c))
    return result

def alt(W, n):
    nrows, ncols = W.shape
    density = n / (nrows*ncols - W.count_nonzero())
    W = W.copy()
    W.data[:] = 1
    W2 = sparse.csr_matrix((nrows, ncols))
    while W2.count_nonzero() < n:
        W2 += sparse.random(nrows, ncols, density=density, format='csr')
        # remove nonzero values from W2 where W is 1
        W2 -= W2.multiply(W)
    W2 = W2.tocoo()    
    r = W2.row[:n]
    c = W2.col[:n]
    result = list(zip(r, c))
    return result

def alt_with_dupes(W, n):
    nrows, ncols = W.shape
    density = n / (nrows*ncols - W.count_nonzero())
    W = W.copy()
    W.data[:] = 1
    W2 = sparse.csr_matrix((nrows, ncols))
    while W2.data.sum() < n:
        tmp = sparse.random(nrows, ncols, density=density, format='csr')
        tmp.data[:] = 1
        W2 += tmp
        # remove nonzero values from W2 where W is 1
        W2 -= W2.multiply(W)
    W2 = W2.tocoo()
    num_repeats = W2.data.astype('int')
    r = np.repeat(W2.row, num_repeats)
    c = np.repeat(W2.col, num_repeats)
    idx = np.random.choice(len(r), n)
    result = list(zip(r[idx], c[idx]))
    return result

Here's a benchmark with:

W = sparse.random(1000, 50000, density=0.02, format='csr')
n = int((np.multiply(*W.shape) - W.nnz)*0.01)

In [194]: %timeit alt(W, n)
809 ms ± 261 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [195]: %timeit orig(W, n)
11.2 s ± 121 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [223]: %timeit alt_with_dupes(W, n)
986 ms ± 290 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Note that alt returns a list with no duplicates. Both orig and alt_with_dupes may return duplicates.

0
votes

First make a list of all the 0's:

list_0s = [(j, i) for i in range(len(matrix[j])) for j in range len(matrix) if matrix[j,i] == 0]

Then get your random choices:

random_0s = random.choices(list_0s, k=n)

Testing this with:

 matrix = np.random.randint(1000, size=(1000,1000))
 n = 100

Takes 0.34 seconds.