3
votes

I can index my numpy array / pytorch tensor with a boolean array/tensor of the same shape or an array/tensor containing integer indexes of the elements I'm after. Which is faster?

2

2 Answers

2
votes

The following tests indicate that it's generally 3x to 20x faster with an index array in both numpy and pytorch:

In [1]: a = torch.arange(int(1e5))
idxs = torch.randint(len(a), (int(1e4),))
ind = torch.zeros_like(a, dtype=torch.uint8)
ind[idxs] = 1
ac, idxsc, indc = a.cuda(), idxs.cuda(), ind.cuda()

In [2]: %timeit a[idxs]
73.4 µs ± 1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: %timeit a[ind]
622 µs ± 8.99 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [4]: %timeit ac[idxsc]
9.51 µs ± 475 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: %timeit ac[indc]
59.6 µs ± 313 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: idxs = torch.arange(len(a)-1, dtype=torch.long)
ind = torch.zeros_like(a, dtype=torch.uint8)
ind[idxs] = 1
ac, idxsc, indc = a.cuda(), idxs.cuda(), ind.cuda()

In [7]: %timeit a[idxs]
146 µs ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: %timeit a[ind]
4.59 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [9]: %timeit ac[idxsc]
33 µs ± 15.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [10]: %timeit ac[indc]
85.9 µs ± 56.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
0
votes

As the prior solution shows, I would expect the integer based indexing to be faster, since the dimension of the output tensor equals the indexing tensors dimension, which makes memory allocation easier.