Column/row slicing a torch sparse tensor

Question

I have a pytorch sparse tensor that I need sliced row/column wise using this slice [idx][:,idx] where idx is a list of indexes, using the mentioned slice yields my desired result on an ordinary float tensor. Is it possible applying the same slicing on a sparse tensor? Example here:

#constructing sparse matrix
i = np.array([[0,1,2,2],[0,1,2,1]])
v = np.ones(4)
i = torch.from_numpy(i.astype("int64"))
v = torch.from_numpy(v.astype("float32"))
test1 = torch.sparse.FloatTensor(i, v)

#constructing float tensor
test2 = np.array([[1,0,0],[0,1,0],[0,1,1]])
test2 = autograd.Variable(torch.cuda.FloatTensor(test2), requires_grad=False)

#slicing
idx = [1,2]
print(test2[idx][:,idx])

output:

Variable containing:
 1  0
 1  1
[torch.cuda.FloatTensor of size 2x2 (GPU 0)]

I am holding a 250.000 x 250.000 adjacency matrix, where I need to slice n rows and n columns, using the random idx, by simply sampling n random idx's. Since the dataset is so large it is not realistic to convert to a more convenient datatype.

can I achieve the same slicing result on test1? Is it even possible? If not, are there any work-arounds?

Right now I am running my model with the following "hack" of a solution:

idx = sorted(random.sample(range(0, np.shape(test1)[0]), 9000))
test1 = test1AsCsr[idx][:,idx].todense().astype("int32")
test1 = autograd.Variable(torch.cuda.FloatTensor(test1), requires_grad=False)

Where test1AsCsr is my test1 converted to a numpy CSR matrix. This solution works, it is however very slow, and makes my GPU utilization very low, since it needs to read/write from CPU memory, constantly.

Edit: Its fine with a non-sparse tensor as result

I am not sure there is an easy way to do so for now (except by manually searching in i for the values in idx). There's an issue covering this problem. — benjaminplanche
I need this problem solved badly, so any solution, even if it isn't an easy one, would be more than welcome. How would you go about constructing the newly sliced matrix from the i vector? — NicolaiF
By "not easy", I meant "not optimal"... I'm not sure how the suggested workaround may work on larger tensors (if it even works the way you want). — benjaminplanche
I have attempted a solution using test1._indices() and np.where(), then constructing a new tensor from the results, this is however to slow, since the np.where() search is linear, and does not operate on the GPU. — NicolaiF
scipy.sparse.csr uses matrix multiplication to select multiple rows or columns. It constructs an extractor sparse matrix. I demonstrate it here: stackoverflow.com/questions/39500649/… — hpaulj

benjaminplanche benjaminplanche · Accepted Answer · 2018-06-04T12:44:11

Possible answer for 2-dimentional sparse indices

Find an answer below, playing with several pytorch methods (torch.eq(), torch.unique(), torch.sort(), etc.) in order to output a compact, sliced tensor of shape (len(idx), len(idx)).

I tested several edge cases (unordered idx, v with 0s, i with multiple same index pairs, etc.), though I may have forgot some. Performance should also be checked.

import torch
import numpy as np

def in1D(x, labels):
    """
    Sub-optimal equivalent to numpy.in1D().
    Hopefully this feature will be properly covered soon
    c.f. https://github.com/pytorch/pytorch/issues/3025
    Snippet by Aron Barreira Bordin
    Args:
        x (Tensor):             Tensor to search values in
        labels (Tensor/list):   1D array of values to search for

    Returns:
        Tensor: Boolean tensor y of same shape as x, with y[ind] = True if x[ind] in labels

    Example:
        >>> in1D(torch.FloatTensor([1, 2, 0, 3]), [2, 3])
        FloatTensor([False, True, False, True])
    """
    mapping = torch.zeros(x.size()).byte()
    for label in labels:
        mapping = mapping | x.eq(label)
    return mapping


def compact1D(x):
    """
    "Compact" values 1D uint tensor, so that all values are in [0, max(unique(x))].
    Args:
        x (Tensor): uint Tensor

    Returns:
        Tensor: uint Tensor of same shape as x

    Example:
        >>> densify1D(torch.ByteTensor([5, 8, 7, 3, 8, 42]))
        ByteTensor([1, 3, 2, 0, 3, 4])
    """
    x_sorted, x_sorted_ind = torch.sort(x, descending=True)
    x_sorted_unique, x_sorted_unique_ind = torch.unique(x_sorted, return_inverse=True)
    x[x_sorted_ind] = x_sorted_unique_ind
    return x

# Input sparse tensor:
i = torch.from_numpy(np.array([[0,1,4,3,2,1],[0,1,3,1,4,1]]).astype("int64"))
v = torch.from_numpy(np.arange(1, 7).astype("float32"))
test1 = torch.sparse.FloatTensor(i, v)
print(test1.to_dense())
# tensor([[ 1.,  0.,  0.,  0.,  0.],
#         [ 0.,  8.,  0.,  0.,  0.],
#         [ 0.,  0.,  0.,  0.,  5.],
#         [ 0.,  4.,  0.,  0.,  0.],
#         [ 0.,  0.,  0.,  3.,  0.]])

# note: test1[1, 1] = v[i[1,:]] + v[i[6,:]] = 2 + 6 = 8
#       since both i[1,:] and i[6,:] are [1,1]

# Input slicing indices:
idx = [4,1,3]

# Getting the elements in `i` which correspond to `idx`:
v_idx = in1D(i, idx).byte()
v_idx = v_idx.sum(dim=0).squeeze() == i.size(0) # or `v_idx.all(dim=1)` for pytorch 0.5+
v_idx = v_idx.nonzero().squeeze()

# Slicing `v` and `i` accordingly:
v_sliced = v[v_idx]
i_sliced = i.index_select(dim=1, index=v_idx)

# Building sparse result tensor:
i_sliced[0] = compact1D(i_sliced[0])
i_sliced[1] = compact1D(i_sliced[1])

# To make sure to have a square dense representation:
size_sliced = torch.Size([len(idx), len(idx)])
res = torch.sparse.FloatTensor(i_sliced, v_sliced, size_sliced)

print(res)
# torch.sparse.FloatTensor of size (3,3) with indices:
# tensor([[ 0,  2,  1,  0],
#         [ 0,  1,  0,  0]])
# and values:
# tensor([ 2.,  3.,  4.,  6.])

print(res.to_dense())
# tensor([[ 8.,  0.,  0.],
#         [ 4.,  0.,  0.],
#         [ 0.,  3.,  0.]])

Previous answer for 1-dimentional sparse indices

Here is a (probably sub-optimal and not covering all edge cases) solution, following the intuitions shared in a related open issue (hopefully this feature will be properly covered soon):

# Constructing a sparse tensor a bit more complicated for the sake of demo:
i = torch.LongTensor([[0, 1, 5, 2]])
v = torch.FloatTensor([[1, 3, 0], [5, 7, 0], [9, 9, 9], [1,2,3]])
test1 = torch.sparse.FloatTensor(i, v)

# note: if you directly have sparse `test1`, you can get `i` and `v`:
# i, v = test1._indices(), test1._values()

# Getting the slicing indices:
idx = [1,2]

# Preparing to slice `v` according to `idx`.
# For that, we gather the list of indices `v_idx` such that i[v_idx[k]] == idx[k]:
i_squeeze = i.squeeze()
v_idx = [(i_squeeze == j).nonzero() for j in idx] # <- doesn't seem optimal...
v_idx = torch.cat(v_idx, dim=1)

# Slicing `v` accordingly:
v_sliced = v[v_idx.squeeze()][:,idx]

# Now defining your resulting sparse tensor.
# I'm not sure what kind of indexing you want, so here are 2 possibilities:
# 1) "Dense" indixing:
test1x = torch.sparse.FloatTensor(torch.arange(v_idx.size(1)).long().unsqueeze(0), v_sliced)
print(test1x)
# torch.sparse.FloatTensor of size (3,2) with indices:
#
#  0  1
# [torch.LongTensor of size (1,2)]
# and values:
#
#  7  0
#  2  3
# [torch.FloatTensor of size (2,2)]

# 2) "Sparse" indixing using the original `idx`:
test1x = torch.sparse.FloatTensor(autograd.Variable(torch.LongTensor(idx)).unsqueeze(0), v_sliced)
# note: this indexing would fail if elements of `idx` were not in `i`.
print(test1x)
# torch.sparse.FloatTensor of size (3,2) with indices:
#
#  1  2
# [torch.LongTensor of size (1,2)]
# and values:
#
#  7  0
#  2  3
# [torch.FloatTensor of size (2,2)]

Column/row slicing a torch sparse tensor

1 Answers

Possible answer for 2-dimentional sparse indices

Previous answer for 1-dimentional sparse indices