If to compare different methods to find value in 1d sorted array, there is no strong winner for array sizes =<10000, each method has its own 'cons' depending on target position, array size and numpy/list option. However, for large numpy arrays(>1M) it is clear that bisect and searchsort are winners.
There are 5 methods bellow: imported bisect with verification, python loop, numba solution tal's answer at Numpy: find first index of value fast, User_Targaryen's answer above, np.searchsort with verification. All of them accept numpy/lists, except tal's answer, where the to-numpy-conversion is required for lists. Only bisect and find_index methods feels fine resisting to lists len increase.
Bellow is an example for 10K numpy array:
import numpy as np
from numba import jit
from timeit import timeit
import bisect
# bisect.bisect
def bisect_(nums,x):
pos = bisect.bisect_left(nums, x)
try:
if nums[pos] == x:
return pos
else: # e.g., bisect_([1,3,4],2)
return -1
except: # e.g. bisect_([1,3,4],5)
return -1
# python loop
def find_first(vec, item):
for i, v in enumerate(vec):
if item == v:
return i
return -1
# tal's answer at https://stackguides.com/questions/7632963/numpy-find-first-index-of-value-fast
@jit(nopython=True)
def find_firstj(vec, item):
"""return the index of the first occurence of item in vec"""
for i, v in enumerate(vec):
if item == v:
return i
return -1
# User_Targaryen's answer above
def binary_search(nums,low,high,x):
while low<=high:
mid = (low+high)//2
if nums[mid]==x:
return mid+1
elif nums[mid]>x:
high = mid-1
else:
low = mid+1
return -1
def find_index(nums,x):
i = 1
l = len(nums)
while i<l:
if nums[i-1]==x:
return i
elif 2*i<l and nums[2*i-1]>x:
return binary_search(nums,i-1,2*i-1,x)
i = 2*i
return binary_search(nums,i//2,l-1,x)
# numpy searchsort
def searchsort(u, x):
pos=np.searchsorted(u, x)
try:
if u[pos] == x:
return pos
else: # e.g., bisect_([1,3,4],2)
return -1
except: # e.g. bisect_([1,3,4],5)
return -1
SIZE = 10000
u = np.arange(SIZE)
# at the beginning
print('---- at the beginning ----')
x = 40
print('bisect.bisect', timeit(lambda: bisect_(u, x), number=100)*10, 's')
print('find_first', timeit(lambda: find_first(u,x), number=100)*10, 's')
print('find_firstj', timeit(lambda: find_firstj(u,x), number=100)*10, 's')
print('find_index', timeit(lambda: find_index(u, x), number=100)*10, 's')
print('searchsort', timeit(lambda: searchsort(u, x), number=100)*10, 's')
# in the middle
print('---- in the middle ----')
x = 5040
print('bisect.bisect', timeit(lambda: bisect.bisect(u, x), number=100)*10, 's')
print('find_first', timeit(lambda: find_first(u,x), number=100)*10, 's')
print('find_firstj', timeit(lambda: find_firstj(u,x), number=100)*10, 's')
print('find_index', timeit(lambda: find_index(u, x), number=100)*10, 's')
print('searchsort', timeit(lambda: searchsort(u, x), number=100)*10, 's')
# in the end
print('---- at the end ----')
x = 9940
print('bisect.bisect', timeit(lambda: bisect.bisect(u, x), number=100)*10, 's')
print('find_first', timeit(lambda: find_first(u,x), number=100)*10, 's')
print('find_firstj', timeit(lambda: find_firstj(u,x), number=100)*10, 's')
print('find_index', timeit(lambda: find_index(u, x), number=100)*10, 's')
print('searchsort', timeit(lambda: searchsort(u, x), number=100)*10, 's')
For 10K numpy array the result is:
---- at the beginning ----
bisect.bisect 0.009455299004912376 s
find_first 0.01455735880881548 s
find_firstj 0.023144721053540707 s
find_index 0.010857589077204466 s
searchsort 0.005023379344493151 s
---- in the middle ----
bisect.bisect 0.002835090272128582 s
find_first 1.3125479291193187 s
find_firstj 0.003973201382905245 s
find_index 0.01642579911276698 s
searchsort 0.007164010312408209 s
---- at the end ----
bisect.bisect 0.0030333898030221462 s
find_first 3.0728489509783685 s
find_firstj 0.006884939502924681 s
find_index 0.01768168993294239 s
searchsort 0.007541531231254339 s
For 1.5M array (x = 40/750040/1499400) the result is:
---- at the beginning ----
bisect.bisect 0.004126268904656172 s
find_first 0.009320289827883244 s
find_firstj 0.0005324906669557095 s
find_index 0.006837178952991962 s
searchsort 0.004959029611200094 s
---- in the middle ----
bisect.bisect 0.003964949864894152 s
find_first 166.71787671046332 s
find_firstj 0.5532677914015949 s
find_index 0.024038420524448156 s
searchsort 0.004650468472391367 s
---- at the end ----
bisect.bisect 0.0036938488483428955 s
find_first 329.8780040605925 s
find_firstj 1.570841120555997 s
find_index 0.026646379847079515 s
searchsort 0.00493861036375165 s
Please, copypaste to check for other sizes and options.
log(size(A))
you wanted, binary search would work. If you're trying to optimize in terms of the location ofx
, my gut says check powers of2
until you find a value larger thanx
, then binary search in the top half, but I'm not completely clear on what you're asking. – Jesus is LordΘ(logd)
which means the best case is logd too, and its not correct in binary search. – Genadilogd
. The best case for any algorithm can beO(1)
by sheer chance/coincidence – User_Targaryen