I have a 1D array in numpy and I want to find the position of the index where a value exceeds the value in numpy array.
E.g.
aa = range(-10,10)
Find position in aa
where, the value 5
gets exceeded.
I have a 1D array in numpy and I want to find the position of the index where a value exceeds the value in numpy array.
E.g.
aa = range(-10,10)
Find position in aa
where, the value 5
gets exceeded.
This is a little faster (and looks nicer)
np.argmax(aa>5)
Since argmax
will stop at the first True
("In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.") and doesn't save another list.
In [2]: N = 10000
In [3]: aa = np.arange(-N,N)
In [4]: timeit np.argmax(aa>N/2)
100000 loops, best of 3: 52.3 us per loop
In [5]: timeit np.where(aa>N/2)[0][0]
10000 loops, best of 3: 141 us per loop
In [6]: timeit np.nonzero(aa>N/2)[0][0]
10000 loops, best of 3: 142 us per loop
given the sorted content of your array, there is an even faster method: searchsorted.
import time
N = 10000
aa = np.arange(-N,N)
%timeit np.searchsorted(aa, N/2)+1
%timeit np.argmax(aa>N/2)
%timeit np.where(aa>N/2)[0][0]
%timeit np.nonzero(aa>N/2)[0][0]
# Output
100000 loops, best of 3: 5.97 µs per loop
10000 loops, best of 3: 46.3 µs per loop
10000 loops, best of 3: 154 µs per loop
10000 loops, best of 3: 154 µs per loop
I was also interested in this and I've compared all the suggested answers with perfplot. (Disclaimer: I'm the author of perfplot.)
If you know that the array you're looking through is already sorted, then
numpy.searchsorted(a, alpha)
is for you. It's O(log(n)) operation, i.e., the speed hardly depends on the size of the array. You can't get faster than that.
If you don't know anything about your array, you're not going wrong with
numpy.argmax(a > alpha)
Already sorted:
Unsorted:
Code to reproduce the plot:
import numpy
import perfplot
alpha = 0.5
numpy.random.seed(0)
def argmax(data):
return numpy.argmax(data > alpha)
def where(data):
return numpy.where(data > alpha)[0][0]
def nonzero(data):
return numpy.nonzero(data > alpha)[0][0]
def searchsorted(data):
return numpy.searchsorted(data, alpha)
perfplot.save(
"out.png",
# setup=numpy.random.rand,
setup=lambda n: numpy.sort(numpy.random.rand(n)),
kernels=[argmax, where, nonzero, searchsorted],
n_range=[2 ** k for k in range(2, 23)],
xlabel="len(array)",
)
In case of a range
or any other linearly increasing array you can simply calculate the index programmatically, no need to actually iterate over the array at all:
def first_index_calculate_range_like(val, arr):
if len(arr) == 0:
raise ValueError('no value greater than {}'.format(val))
elif len(arr) == 1:
if arr[0] > val:
return 0
else:
raise ValueError('no value greater than {}'.format(val))
first_value = arr[0]
step = arr[1] - first_value
# For linearly decreasing arrays or constant arrays we only need to check
# the first element, because if that does not satisfy the condition
# no other element will.
if step <= 0:
if first_value > val:
return 0
else:
raise ValueError('no value greater than {}'.format(val))
calculated_position = (val - first_value) / step
if calculated_position < 0:
return 0
elif calculated_position > len(arr) - 1:
raise ValueError('no value greater than {}'.format(val))
return int(calculated_position) + 1
One could probably improve that a bit. I have made sure it works correctly for a few sample arrays and values but that doesn't mean there couldn't be mistakes in there, especially considering that it uses floats...
>>> import numpy as np
>>> first_index_calculate_range_like(5, np.arange(-10, 10))
16
>>> np.arange(-10, 10)[16] # double check
6
>>> first_index_calculate_range_like(4.8, np.arange(-10, 10))
15
Given that it can calculate the position without any iteration it will be constant time (O(1)
) and can probably beat all other mentioned approaches. However it requires a constant step in the array, otherwise it will produce wrong results.
A more general approach would be using a numba function:
@nb.njit
def first_index_numba(val, arr):
for idx in range(len(arr)):
if arr[idx] > val:
return idx
return -1
That will work for any array but it has to iterate over the array, so in the average case it will be O(n)
:
>>> first_index_numba(4.8, np.arange(-10, 10))
15
>>> first_index_numba(5, np.arange(-10, 10))
16
Even though Nico Schlömer already provided some benchmarks I thought it might be useful to include my new solutions and to test for different "values".
The test setup:
import numpy as np
import math
import numba as nb
def first_index_using_argmax(val, arr):
return np.argmax(arr > val)
def first_index_using_where(val, arr):
return np.where(arr > val)[0][0]
def first_index_using_nonzero(val, arr):
return np.nonzero(arr > val)[0][0]
def first_index_using_searchsorted(val, arr):
return np.searchsorted(arr, val) + 1
def first_index_using_min(val, arr):
return np.min(np.where(arr > val))
def first_index_calculate_range_like(val, arr):
if len(arr) == 0:
raise ValueError('empty array')
elif len(arr) == 1:
if arr[0] > val:
return 0
else:
raise ValueError('no value greater than {}'.format(val))
first_value = arr[0]
step = arr[1] - first_value
if step <= 0:
if first_value > val:
return 0
else:
raise ValueError('no value greater than {}'.format(val))
calculated_position = (val - first_value) / step
if calculated_position < 0:
return 0
elif calculated_position > len(arr) - 1:
raise ValueError('no value greater than {}'.format(val))
return int(calculated_position) + 1
@nb.njit
def first_index_numba(val, arr):
for idx in range(len(arr)):
if arr[idx] > val:
return idx
return -1
funcs = [
first_index_using_argmax,
first_index_using_min,
first_index_using_nonzero,
first_index_calculate_range_like,
first_index_numba,
first_index_using_searchsorted,
first_index_using_where
]
from simple_benchmark import benchmark, MultiArgument
and the plots were generated using:
%matplotlib notebook
b.plot()
b = benchmark(
funcs,
{2**i: MultiArgument([0, np.arange(2**i)]) for i in range(2, 20)},
argument_name="array size")
The numba function performs best followed by the calculate-function and the searchsorted function. The other solutions perform much worse.
b = benchmark(
funcs,
{2**i: MultiArgument([2**i-2, np.arange(2**i)]) for i in range(2, 20)},
argument_name="array size")
For small arrays the numba function performs amazingly fast, however for bigger arrays it's outperformed by the calculate-function and the searchsorted function.
b = benchmark(
funcs,
{2**i: MultiArgument([np.sqrt(2**i), np.arange(2**i)]) for i in range(2, 20)},
argument_name="array size")
This is more interesting. Again numba and the calculate function perform great, however this is actually triggering the worst case of searchsorted which really doesn't work well in this case.
Another interesting point is how these function behave if there is no value whose index should be returned:
arr = np.ones(100)
value = 2
for func in funcs:
print(func.__name__)
try:
print('-->', func(value, arr))
except Exception as e:
print('-->', e)
With this result:
first_index_using_argmax
--> 0
first_index_using_min
--> zero-size array to reduction operation minimum which has no identity
first_index_using_nonzero
--> index 0 is out of bounds for axis 0 with size 0
first_index_calculate_range_like
--> no value greater than 2
first_index_numba
--> -1
first_index_using_searchsorted
--> 101
first_index_using_where
--> index 0 is out of bounds for axis 0 with size 0
Searchsorted, argmax, and numba simply return a wrong value. However searchsorted
and numba
return an index that is not a valid index for the array.
The functions where
, min
, nonzero
and calculate
throw an exception. However only the exception for calculate
actually says anything helpful.
That means one actually has to wrap these calls in an appropriate wrapper function that catches exceptions or invalid return values and handle appropriately, at least if you aren't sure if the value could be in the array.
Note: The calculate and searchsorted
options only work in special conditions. The "calculate" function requires a constant step and the searchsorted requires the array to be sorted. So these could be useful in the right circumstances but aren't general solutions for this problem. In case you're dealing with sorted Python lists you might want to take a look at the bisect module instead of using Numpys searchsorted.