Most efficient way to rearrange 2D numpy array from another 2D index array

Question

In brief

In Python 3.6 and using Numpy, what would be the most efficient way to rearrange the elements of a 2D array according to indices present in a different, similarly shaped, index 2D array?

Detailed

Suppose I have the following two 9 x 5 arrays, called A and B:

import numpy as np
A = np.array([[0.32, 0.35, 0.88, 0.63, 1.  ],
              [0.23, 0.69, 0.98, 0.22, 0.96],
              [0.7 , 0.51, 0.09, 0.58, 0.19],
              [0.98, 0.42, 0.62, 0.94, 0.46],
              [0.48, 0.59, 0.17, 0.23, 0.98]])

B = np.array([[4, 0, 3, 2, 1],
              [3, 2, 4, 1, 0],
              [4, 3, 0, 2, 1],
              [4, 2, 0, 3, 1],
              [0, 3, 1, 2, 4]])

I can successfully rearrange A using B as an index array by it by np.array(list(map(lambda i, j: j[i], B, A))):

array([[1.  , 0.32, 0.63, 0.88, 0.35],
       [0.22, 0.98, 0.96, 0.69, 0.23],
       [0.19, 0.58, 0.7 , 0.09, 0.51],
       [0.46, 0.62, 0.98, 0.94, 0.42],
       [0.48, 0.23, 0.59, 0.17, 0.98]])

However, when the dimensions of A and B increase, such a solution becomes really inefficient. If I am not mistaken, that is because:

using the lambda loops over all rows of A instead of relying on Numpy vectorizations
mapping is slow
converting list to array eats precious time.

Since in my real use case those arrays can grow quite big, and I have to reorder many of them in a long loop, a lot of my current performance bottleneck (measured with a profiler) comes from that single line of code above.

My question: what would the most efficient, more Numpy-smart way of achieving the above?

A toy code to test general arrays and time the process could be:

import numpy as np
nRows = 20000
nCols = 10000
A = np.round(np.random.uniform(0, 1, (nRows, nCols)), 2)
B = np.full((nRows, nCols), range(nCols))
for r in range(nRows):
    np.random.shuffle(B[r])
%time X = np.array(list(map(lambda i, j: j[i], B, A)))

A[ np.arange(5)[:,None],B] should also work, but take_along is easier (if you remember it exists :) ). — hpaulj
@PaulPanzer I made some tests and the take_along_axis function is actually slower tha a FOR loop. Mystery... — bousof
Oops! Are your arrays rather small? What about @hpaulj's suggestion? — Paul Panzer
@PaulPanzer oh, it wasn't me (the OP) who commented before. My arrays can be rather big, significantly bigger than 20000 x 10000. I am playing with @bousof's suggestion, and it does seem that the loop becomes the most attractive for big nCols. take_along_axis and @hpaulj's are faster as nCols decreases — AbbieW

bousof bousof · Accepted Answer · 2020-07-04T21:50:04

A comparison with three other possibilities:

import numpy as np
import time

# Input
nRows = 20000
nCols = 10000
A = np.round(np.random.uniform(0, 1, (nRows, nCols)), 2)
B = np.full((nRows, nCols), range(nCols))
for r in range(nRows):
  np.random.shuffle(B[r])

# Original
t_start = time.time()
X = np.array(list(map(lambda i, j: j[i], B, A)))
print('Timer 1:', time.time()-t_start, 's')

# FOR loop
t_start = time.time()
X = np.zeros((nRows, nCols))
for i in range(nRows):
  X[i] = A[i][B[i]]
print('Timer 2:', time.time()-t_start, 's')

# take_along_axis
t_start = time.time()
X = np.take_along_axis(A,B,1)
print('Timer 3:', time.time()-t_start, 's')

# Indexing
t_start = time.time()
X = A[ np.arange(nRows)[:,None],B]
print('Timer 4:', time.time()-t_start, 's')

Ouput:

% python3 script.py
Timer 1: 2.191567897796631 s
Timer 2: 1.3516249656677246 s
Timer 3: 1.675267219543457 s
Timer 4: 1.646852970123291 s

For low number of columns (nRows,nCols)=(200000,10) the results are completely different however:

% python3 script.py
Timer 1: 0.2729799747467041 s
Timer 2: 0.22678399085998535 s
Timer 3: 0.016162633895874023 s
Timer 4: 0.014748811721801758 s

Most efficient way to rearrange 2D numpy array from another 2D index array

In brief

Detailed

1 Answers