In brief
In Python 3.6 and using Numpy, what would be the most efficient way to rearrange the elements of a 2D array according to indices present in a different, similarly shaped, index 2D array?
Detailed
Suppose I have the following two 9 x 5 arrays, called A and B:
import numpy as np
A = np.array([[0.32, 0.35, 0.88, 0.63, 1. ],
[0.23, 0.69, 0.98, 0.22, 0.96],
[0.7 , 0.51, 0.09, 0.58, 0.19],
[0.98, 0.42, 0.62, 0.94, 0.46],
[0.48, 0.59, 0.17, 0.23, 0.98]])
B = np.array([[4, 0, 3, 2, 1],
[3, 2, 4, 1, 0],
[4, 3, 0, 2, 1],
[4, 2, 0, 3, 1],
[0, 3, 1, 2, 4]])
I can successfully rearrange A using B as an index array by it by np.array(list(map(lambda i, j: j[i], B, A)))
:
array([[1. , 0.32, 0.63, 0.88, 0.35],
[0.22, 0.98, 0.96, 0.69, 0.23],
[0.19, 0.58, 0.7 , 0.09, 0.51],
[0.46, 0.62, 0.98, 0.94, 0.42],
[0.48, 0.23, 0.59, 0.17, 0.98]])
However, when the dimensions of A and B increase, such a solution becomes really inefficient. If I am not mistaken, that is because:
- using the lambda loops over all rows of A instead of relying on Numpy vectorizations
- mapping is slow
- converting list to array eats precious time.
Since in my real use case those arrays can grow quite big, and I have to reorder many of them in a long loop, a lot of my current performance bottleneck (measured with a profiler) comes from that single line of code above.
My question: what would the most efficient, more Numpy-smart way of achieving the above?
A toy code to test general arrays and time the process could be:
import numpy as np
nRows = 20000
nCols = 10000
A = np.round(np.random.uniform(0, 1, (nRows, nCols)), 2)
B = np.full((nRows, nCols), range(nCols))
for r in range(nRows):
np.random.shuffle(B[r])
%time X = np.array(list(map(lambda i, j: j[i], B, A)))
np.take_along_axis(A,B,1)
? – Paul PanzerA[ np.arange(5)[:,None],B]
should also work, buttake_along
is easier (if you remember it exists :) ). – hpauljtake_along_axis
and @hpaulj's are faster as nCols decreases – AbbieW