0
votes

I want to convert a sparse csr 2d input matrix of feature vectors into a sparse csr 2d matrix of sliding window feature vectors. So to take a non-sparse example for a window of size 2:

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

becomes:

array([[0, 1, 2, 3, 4, 5],
       [3, 4, 5, 6, 7, 8]])

The following function achieves this for 2d numpy arrays:

import numpy as np
def window_stack(a, width=2):
    n = a.shape[0]
    return np.hstack(a[i:1+n+i-width:1] for i in range(0, width))

However in my case I have a large sparse csr matrix. How can I modify the window_stack function to work with large sparse csr matrices?

I can't afford to make the sparse array dense as an intermediate step as this would be much too large.

1
Okay. I really didn't see that since it was literally the last line on the bottom. Do make it more clear next time ;-) - cs95
@cᴏʟᴅsᴘᴇᴇᴅ It's also in the title but I take your point :) I did like your numpy trick however. - eleanora

1 Answers

1
votes

You can replicate the dense calculation with sparse.hstack:

In [1]: A = np.arange(9).reshape(3,3)
In [2]: from scipy import sparse
In [3]: M = sparse.csr_matrix(A)
In [4]: M
Out[4]: 
<3x3 sparse matrix of type '<class 'numpy.int32'>'
    with 8 stored elements in Compressed Sparse Row format>
In [5]: M.A
Out[5]: 
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]], dtype=int32)
In [6]: [A[i:1+3+i-2,:] for i in range(2)]
Out[6]: 
[array([[0, 1, 2],
        [3, 4, 5]]), array([[3, 4, 5],
        [6, 7, 8]])]
In [7]: [M[i:1+3+i-2,:] for i in range(2)]
Out[7]: 
[<2x3 sparse matrix of type '<class 'numpy.int32'>'
    with 5 stored elements in Compressed Sparse Row format>,
 <2x3 sparse matrix of type '<class 'numpy.int32'>'
    with 6 stored elements in Compressed Sparse Row format>]
In [8]: sparse.hstack([M[i:1+3+i-2,:] for i in range(2)])
Out[8]: 
<2x6 sparse matrix of type '<class 'numpy.int32'>'
    with 11 stored elements in COOrdinate format>
In [9]: _.A
Out[9]: 
array([[0, 1, 2, 3, 4, 5],
       [3, 4, 5, 6, 7, 8]], dtype=int32)

I won't make any speed promises.

The sparse matrix multiplication is well developed (provided the matrix is sparse, which this example isn't). In fact operations like row/column sum, and even slicing is done with matrix multiplication. That is it constructs a matrix or vector that when dotted sums or selects values.

sparse.hstack uses a sparse block matrix operation. This converts the inputs to coo format, and combines the respective row,col,data arrays with offsets, and makes a new coo matrix.

I can imagine combining these steps, but it would take a fair amount of work.