To manipulate Scipy matrices, typically, the built-in methods are used. But sometimes you need to read the matrix data to assign it to non-sparse data types. For the sake of demonstration I created a random LIL sparse matrix and converted it to a Numpy array (pure python data types would have made a better sense!) using different methods.
from __future__ import print_function
from scipy.sparse import rand, csr_matrix, lil_matrix
import numpy as np
dim = 1000
lil = rand(dim, dim, density=0.01, format='lil', dtype=np.float32, random_state=0)
print('number of nonzero elements:', lil.nnz)
arr = np.zeros(shape=(dim,dim), dtype=float)
number of nonzero elements: 10000
Reading by indexing
%%timeit -n3
for i in xrange(dim):
for j in xrange(dim):
arr[i,j] = lil[i,j]
3 loops, best of 3: 6.42 s per loop
Using the nonzero() method
%%timeit -n3
nnz = lil.nonzero() # indices of nonzero values
for i, j in zip(nnz[0], nnz[1]):
arr[i,j] = lil[i,j]
3 loops, best of 3: 75.8 ms per loop
Using the built-in method to convert directly to array
This one is not a general solution for reading the matrix data, so it does not count as a solution.
%timeit -n3 arr = lil.toarray()
3 loops, best of 3: 7.85 ms per loop
Reading Scipy sparse matrices with these methods is not efficient at all. Is there any faster way to read these matrices?
arr? That's not fast. But in general indexing a sparse matrix is slower than indexing a dense array. If you really need speed, work directly with the data attributes of the matrix - at the expense of generality. - hpaulj