8
votes

I have large but sparse arrays and I want to rearrange them by swapping rows an columns. What is a good way to do this in scipy.sparse?

Some issues

  • I don't think that permutation matrices are well suited for this task, as they like randomly change the sparsity structure. And a manipulation will always 'multiply' all columns or rows, even if there are only a few swaps necessary.

  • What is the best sparse matrix representation in scipy.sparse for this task?

  • Suggestions for implementation are very welcome.

I have tagged this with Matlab as well, since this question might find an answer that is not necessarily scipy specific.

3
I need this for a particular implementation. However, as a colleague pointed out to me, in general, one would not do permutations on a sparse matrix. A sparse matrix A is generally used as a linear map y=Ax, e.g. in iterative solvers. Thus this swapping is better realized by writing a wrapper around A, swapping the entries of the input vector x (this is column swapping in A) or the entries of y (this is row swapping). - Jan

3 Answers

5
votes

CSC format keeps a list of the row indices of all non-zero entries, CSR format keeps a list of the column indices of all non-zero entries. I think you can take advantage of that to swap things around as follows, and I think there shouldn't be any side-effects to it:

def swap_rows(mat, a, b) :
    mat_csc = scipy.sparse.csc_matrix(mat)
    a_idx = np.where(mat_csc.indices == a)
    b_idx = np.where(mat_csc.indices == b)
    mat_csc.indices[a_idx] = b
    mat_csc.indices[b_idx] = a
    return mat_csc.asformat(mat.format)

def swap_cols(mat, a, b) :
    mat_csr = scipy.sparse.csr_matrix(mat)
    a_idx = np.where(mat_csr.indices == a)
    b_idx = np.where(mat_csr.indices == b)
    mat_csr.indices[a_idx] = b
    mat_csr.indices[b_idx] = a
    return mat_csr.asformat(mat.format)

You could now do something like this:

>>> mat = np.zeros((5,5))
>>> mat[[1, 2, 3, 3], [0, 2, 2, 4]] = 1
>>> mat = scipy.sparse.lil_matrix(mat)
>>> mat.todense()
matrix([[ 0.,  0.,  0.,  0.,  0.],
        [ 1.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  1.],
        [ 0.,  0.,  0.,  0.,  0.]])
>>> swap_rows(mat, 1, 3)
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 4 stored elements in LInked List format>
>>> swap_rows(mat, 1, 3).todense()
matrix([[ 0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  1.],
        [ 0.,  0.,  1.,  0.,  0.],
        [ 1.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.]])
>>> swap_cols(mat, 0, 4)
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 4 stored elements in LInked List format>
>>> swap_cols(mat, 0, 4).todense()
matrix([[ 0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.],
        [ 0.,  0.,  1.,  0.,  0.],
        [ 1.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.]])

I have used a LIL matrix to show how you could preserve the type of your output. In your application you probably want to already be in CSC or CSR format, and select whether to swap rows or columns first based on it, to minimize conversions.

0
votes

In Matlab you can just index the columns and rows the way you like:

Matrix = speye(10);
mycolumnorder = [1 2 3 4 5 6 10 9 8 7];
myroworder = [4 3 2 1 5 6 7 8 9 10];
Myorderedmatrix = Matrix(myroworder,mycolumnorder);

I think this preserves sparsity... Don't know about scipy though...

0
votes

I've found using matrix operations to be the most efficient. Here's a function which will permute the rows and/or columns to a specified order. It can be modified to swap two specific rows/columns if you would like.

from scipy import sparse

def permute_sparse_matrix(M, row_order=None, col_order=None):
    """
    Reorders the rows and/or columns in a scipy sparse matrix to the specified order.
    """
    if row_order is None and col_order is None:
        return M
    
    new_M = M
    if row_order is not None:
        I = sparse.eye(M.shape[0]).tocoo()
        I.row = I.row[row_order]
        new_M = I.dot(new_M)
    if col_order is not None:
        I = sparse.eye(M.shape[1]).tocoo()
        I.col = I.col[col_order]
        new_M = new_M.dot(I)
    return new_M