One of the best ways to build a scipy sparse matrix is with the coo_matrix method ie.
coo_matrix((data, (i, j)), [shape=(M, N)])
where:
data[:] are the entries of the matrix, in any order
i[:] are the row indices of the matrix entries
j[:] are the column indices of the matrix entries
But, if the matrix is very large it is not practical to load the entire i, j and data vectors into memory.
How do you build a coo_matrix such that (data, (i, j)) is fed (with an iterator or generator) from disk and the array/vector objects on disk are either in .npy or pickle formats?
Pickle is the better option as numpy.save/load are not optimized for scipy sparse. Maybe there is a another faster format.
Both numpy.genfromtext() and numpy.loadtxt() are cumbersome, slow and memory hogs.