Find element in a sorted array

votes

Given a sorted array A and an element x, I need to find an algorithm that returns the index of x in A or -1 if x is not in A. the time complexity of the algorithm should be Θ(logd) when d is the number of elements that appears before x in A, or if x is not in A, d is the number of elements that were before x if he was in A.

Binary search is not good enough because its best case is O(1). I thought of starting from the beginning of the array, and start checking the indexes that are powers of 2. but I got lost.

arraysalgorithmsearchtime-complexity

Binary search worst case is O(log n)… - taylor swift

If it was a log(size(A)) you wanted, binary search would work. If you're trying to optimize in terms of the location of x, my gut says check powers of 2 until you find a value larger than x, then binary search in the top half, but I'm not completely clear on what you're asking. - Jesus is Lord

@taylorswift I know, but I need Θ(logd) which means the best case is logd too, and its not correct in binary search. - Genadi

@Genadi , I’m not sure if you understand what best case and worst case complexity means, worst case means that the algorithm can never take longer than O(log n) to perform this search, no matter the input. If you need an O(log n) algorithm, than every O(1) algorithm satisfies that since O(1) <~ O(log(n)). - taylor swift

According to your notation, you mean to say that the average time complexity should be logd. The best case for any algorithm can be O(1) by sheer chance/coincidence - User_Targaryen

3 Answers

votes

You can do it like this: It uses Θ(log d) steps to find a range of size Θ(d), and then does a binary search in that range in another Θ(log d) steps.

int search(int[] array, int length, int valueToFind)
{
    int pos=0;
    int limit=min(length,1);
    while(limit < length && array[limit] < valueToFind)
    {
        pos=limit+1;
        limit = min(length, limit*2+1);
    }
    while(pos<limit)
    {
        int testpos = pos+((limit-pos)>>1);

        if (array[testpos]<valueToFind)
            pos=testpos+1;
        else
            limit=testpos;
    }
    return (pos < length && array[pos]==valueToFind ? pos : -1);
}

Note that the binary search I use does not exit early, and always takes Θ(log (limit-pos)) time. Even so it's faster than other searches that do exit early, because it does only one comparison per iteration. I describe other advantages here:

How can I simplify this working Binary Search code in C?

votes

I have a simple python implementation based on the power of 2's approach as discussed in the comments section. Please have a look:

def binary_search(nums,low,high,x):
  while low<=high:
    mid = (low+high)/2
    if nums[mid]==x:
      return mid+1
    elif nums[mid]>x:
      high = mid-1
    else:
      low = mid+1
  return -1

def find_index(nums,x):
  i = 1
  l = len(nums)
  while i<l:
    if nums[i-1]==x:
      return i
    elif 2*i<l and nums[2*i-1]>x:
      return binary_search(nums,i-1,2*i-1,x)
    i = 2*i
  return binary_search(nums,i/2,l-1,x)

def main():
  line = raw_input("Enter numbers: ").split()
  nums = []
  for ele in line:
    nums.append(int(ele))

  nums = sorted(nums)
  print "Sorted array: ",nums
  x = int(raw_input("Enter the element to find in sorted array: "))
  print find_index(nums, x)

main()

Firstly, it tries to find the target element by moving over indexes with a power of 2.

At any point if the current element exceeds the target element, then we do a binary search between the current index(current power of 2) and the previous index(previous power of 2).

The time complexity of the searching process is logd on an average. Also the best case time complexity is logd as expected!!

Hope it is easy to understand!!!!

votes

If to compare different methods to find value in 1d sorted array, there is no strong winner for array sizes =<10000, each method has its own 'cons' depending on target position, array size and numpy/list option. However, for large numpy arrays(>1M) it is clear that bisect and searchsort are winners.

There are 5 methods bellow: imported bisect with verification, python loop, numba solution tal's answer at Numpy: find first index of value fast, User_Targaryen's answer above, np.searchsort with verification. All of them accept numpy/lists, except tal's answer, where the to-numpy-conversion is required for lists. Only bisect and find_index methods feels fine resisting to lists len increase.

Bellow is an example for 10K numpy array:

import numpy as np
from numba import jit
from timeit import timeit
import bisect
   
# bisect.bisect
def bisect_(nums,x):
    pos = bisect.bisect_left(nums, x)
    try: 
        if nums[pos] == x:
            return pos
        else:  # e.g., bisect_([1,3,4],2)
            return -1
    except:  # e.g. bisect_([1,3,4],5) 
        return -1


# python loop
def find_first(vec, item):
    for i, v in enumerate(vec):
        if item == v:
            return i
    return -1

# tal's answer at https://stackguides.com/questions/7632963/numpy-find-first-index-of-value-fast
@jit(nopython=True)
def find_firstj(vec, item):
    """return the index of the first occurence of item in vec"""
    for i, v in enumerate(vec):
        if item == v:
            return i
    return -1

# User_Targaryen's answer above
def binary_search(nums,low,high,x):
  while low<=high:
    mid = (low+high)//2
    if nums[mid]==x:
      return mid+1
    elif nums[mid]>x:
      high = mid-1
    else:
      low = mid+1
  return -1

def find_index(nums,x):
  i = 1
  l = len(nums)
  while i<l:
    if nums[i-1]==x:
      return i
    elif 2*i<l and nums[2*i-1]>x:
      return binary_search(nums,i-1,2*i-1,x)
    i = 2*i
  return binary_search(nums,i//2,l-1,x)

# numpy searchsort
def searchsort(u, x):
    pos=np.searchsorted(u, x)
    try: 
        if u[pos] == x:
            return pos
        else:  # e.g., bisect_([1,3,4],2)
            return -1
    except:  # e.g. bisect_([1,3,4],5)
        return -1

SIZE = 10000
u = np.arange(SIZE)

# at the beginning
print('---- at the beginning ----')
x = 40

print('bisect.bisect', timeit(lambda: bisect_(u, x), number=100)*10, 's')
print('find_first', timeit(lambda: find_first(u,x), number=100)*10, 's')
print('find_firstj', timeit(lambda: find_firstj(u,x), number=100)*10, 's')
print('find_index', timeit(lambda: find_index(u, x), number=100)*10, 's')
print('searchsort', timeit(lambda: searchsort(u, x), number=100)*10, 's')

# in the middle
print('---- in the middle ----')
x = 5040

print('bisect.bisect', timeit(lambda: bisect.bisect(u, x), number=100)*10, 's')
print('find_first', timeit(lambda: find_first(u,x), number=100)*10, 's')
print('find_firstj', timeit(lambda: find_firstj(u,x), number=100)*10, 's')
print('find_index', timeit(lambda: find_index(u, x), number=100)*10, 's')
print('searchsort', timeit(lambda: searchsort(u, x), number=100)*10, 's')

# in the end
print('---- at the end ----')


 x = 9940
    
    print('bisect.bisect', timeit(lambda: bisect.bisect(u, x), number=100)*10, 's')
    print('find_first', timeit(lambda: find_first(u,x), number=100)*10, 's')
    print('find_firstj', timeit(lambda: find_firstj(u,x), number=100)*10, 's')
    print('find_index', timeit(lambda: find_index(u, x), number=100)*10, 's')
    print('searchsort', timeit(lambda: searchsort(u, x), number=100)*10, 's')

For 10K numpy array the result is:

    ---- at the beginning ----
bisect.bisect 0.009455299004912376 s
find_first 0.01455735880881548 s
find_firstj 0.023144721053540707 s
find_index 0.010857589077204466 s
searchsort 0.005023379344493151 s
---- in the middle ----
bisect.bisect 0.002835090272128582 s
find_first 1.3125479291193187 s
find_firstj 0.003973201382905245 s
find_index 0.01642579911276698 s
searchsort 0.007164010312408209 s
---- at the end ----
bisect.bisect 0.0030333898030221462 s
find_first 3.0728489509783685 s
find_firstj 0.006884939502924681 s
find_index 0.01768168993294239 s
searchsort 0.007541531231254339 s

For 1.5M array (x = 40/750040/1499400) the result is:

---- at the beginning ----
bisect.bisect 0.004126268904656172 s
find_first 0.009320289827883244 s
find_firstj 0.0005324906669557095 s
find_index 0.006837178952991962 s
searchsort 0.004959029611200094 s
---- in the middle ----
bisect.bisect 0.003964949864894152 s
find_first 166.71787671046332 s
find_firstj 0.5532677914015949 s
find_index 0.024038420524448156 s
searchsort 0.004650468472391367 s
---- at the end ----
bisect.bisect 0.0036938488483428955 s
find_first 329.8780040605925 s
find_firstj 1.570841120555997 s
find_index 0.026646379847079515 s
searchsort 0.00493861036375165 s

Please, copypaste to check for other sizes and options.