80
votes

In order to find the index of the smallest value, I can use argmin:

import numpy as np
A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
print A.argmin()     # 4 because A[4] = 0.1

But how can I find the indices of the k-smallest values?

I'm looking for something like:

print A.argmin(numberofvalues=3)   
# [4, 0, 7]  because A[4] <= A[0] <= A[7] <= all other A[i]

Note: in my use case A has between ~ 10 000 and 100 000 values, and I'm interested for only the indices of the k=10 smallest values. k will never be > 10.

4
See this question, especially the second answer there, for the best solution to this (it's O(n) - full sorting the entire array is not absolutely necessary). - Alex Riley

4 Answers

135
votes

Use np.argpartition. It does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus the first k elements will be the k-smallest elements.

import numpy as np

A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
k = 3

idx = np.argpartition(A, k)
print(idx)
# [4 0 7 3 1 2 6 5]

This returns the k-smallest values. Note that these may not be in sorted order.

print(A[idx[:k]])
# [ 0.1  1.   1.5]

To obtain the k-largest values use

idx = np.argpartition(A, -k)
# [4 0 7 3 1 2 6 5]

A[idx[-k:]]
# [  9.  17.  17.]

WARNING: Do not (re)use idx = np.argpartition(A, k); A[idx[-k:]] to obtain the k-largest. That won't always work. For example, these are NOT the 3 largest values in x:

x = np.array([100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 0])
idx = np.argpartition(x, 3)
x[idx[-3:]]
array([ 70,  80, 100])

Here is a comparison against np.argsort, which also works but just sorts the entire array to get the result.

In [2]: x = np.random.randn(100000)

In [3]: %timeit idx0 = np.argsort(x)[:100]
100 loops, best of 3: 8.26 ms per loop

In [4]: %timeit idx1 = np.argpartition(x, 100)[:100]
1000 loops, best of 3: 721 µs per loop

In [5]: np.alltrue(np.sort(np.argsort(x)[:100]) == np.sort(np.argpartition(x, 100)[:100]))
Out[5]: True
20
votes

You can use numpy.argsort with slicing

>>> import numpy as np
>>> A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
>>> np.argsort(A)[:3]
array([4, 0, 7], dtype=int32)
2
votes

For n-dimentional arrays, this function works well. The indecies are returned in a callable form. If you want a list of the indices to be returned, then you need to transpose the array before you make a list.

To retrieve the k largest, simply pass in -k.

def get_indices_of_k_smallest(arr, k):
    idx = np.argpartition(arr.ravel(), k)
    return tuple(np.array(np.unravel_index(idx, arr.shape))[:, range(min(k, 0), max(k, 0))])
    # if you want it in a list of indices . . . 
    # return np.array(np.unravel_index(idx, arr.shape))[:, range(k)].transpose().tolist()

Example:

r = np.random.RandomState(1234)
arr = r.randint(1, 1000, 2 * 4 * 6).reshape(2, 4, 6)

indices = get_indices_of_k_smallest(arr, 4)
indices
# (array([1, 0, 0, 1], dtype=int64),
#  array([3, 2, 0, 1], dtype=int64),
#  array([3, 0, 3, 3], dtype=int64))

arr[indices]
# array([ 4, 31, 54, 77])

%%timeit
get_indices_of_k_smallest(arr, 4)
# 17.1 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
0
votes

numpy.partition(your_array, k) is an alternative. No slicing necessary as it gives the values sorted until the kth element.