2
votes

This was an interview question I was recently asked at Adobe:

In an array, find the maximum length subarray with the condition 2 * min > max, where min is the minimum element of the subarray, and max is the maximum element of the subarray.

Does anyone has any approach better than O(n^2)?
Of course, we can't sort, as a subarray is required.

Below is my O(n^2) approach:

max=Integer.MIN_VALUE;
for (int i=0; i<A.length-1;i++)
  for(j=i+1;j<A.length;j++)
  {
    int min =findMin(A,i,j);
    int max =findMAx(A,i,j);
    if(2*min<=max) {
      if(j-i+1>max) 
        max = j-i+1
    }
  }

Does anybody know an O(n) solution?

3
I incorporated additional info from the comments and reorganized the whole question in the attempt to make it more digestible.zx485
Your solution is actually O(n³) rather than O(n²); you forgot to take into account the cost of findMin and findMax, which would be linear unless you built something extra to make it faster. That said, it's not hard to improve your solution to be in O(n²), by incrementally adjusting min and max as j iterates over the array rather than recomputing them from scratch on each iteration.ruakh
FWIW, although an O(n) solution seems like a tall order, I see an O(n log n) solution: if you use heaps or a red-black tree to help keep track of min and max, you can write a while-loop that increments j whenever possible without violating the criterion, and increments i whenever incrementing j is not possible without violating the criterion.ruakh
I doubt that using heaps will give you O(n log n). Would you use a min and a max heap? If you encounter a number that is too small, you would have to remove all maxes that violate the constraint from both heaps and take the highest index + 1 to get the new start point. Or am I missing something?maraca
@maraca: If you're replying to a comment of mine, and you want me to see the reply, you should start with the comment with @ruakh: so it shows up in my inbox. (It's only by chance that I saw your reply here.) But to answer your question -- yes, you'd use a min-heap and a max-heap, and yes, you'd sometimes need to remove a whole bunch of elements at once. But any given element is only added at most once and removed at most once, so when you add up all the steps, they come out to O(n log n). (Note that log n + 0 + 0 + 0 = log n.)ruakh

3 Answers

3
votes

Let A[ij] be the subarray consisting of A[i], A[i+1], … A[j].

Observations:

  • If A[ij] doesn't satisfy the criterion, then neither does A[i…(j+1)], because 2·min(A[i…(j+1)]) ≤ 2·min(A[ij]) ≤ max(A[ij]) ≤ max(A[i…(j+1)]). So you can abort your inner loop as soon as you find a j for which condition is not satisfied.
  • If we've already found a subarray of length L that meets the criterion, then there's no need to consider any subarray with length ≤ L. So you can start your inner loop with j = i + maxLength rather than j = i + 1. (Of course, you'll need to initialize maxLength to 0 rather than Integer.MIN_VALUE.)

Combining the above, we have:

int maxLength = 0;
for (int i = 0; i < A.length; ++i) {
    for (int j = i + maxLength; j < A.length; ++j) {
        if (findMin(A,i,j) * 2 > findMax(A,i,j)) {
            // success -- now let's look for a longer subarray:
            maxLength = j - i + 1;
        } else {
            // failure -- keep looking for a subarray this length:
            break;
        }
    }
}

It may not be obvious at first glance, but the inner loop now goes through a total of only O(n) iterations, because j can only take each value at most once. (For example, if i is 3 and maxLength is 5, then j starts at 8. If we A[3…8] meets the criterion, we increment maxLength until we find a subarray that doesn't meet the criterion. Once that happens, we progress from A[i…(i+maxLength)] to A[(i+1)…((i+1)+maxLength)], which means the new loop starts with a greater j than the previous loop left off.)

We can make this more explicit by refactoring a bit to model A[ij] as a sliding-and-potentially-expanding window: incrementing i removes an element from the left edge of the window, incrementing j adds an element to the right edge of the window, and there's never any need to increment i without also incrementing j:

int maxLength = 0;
int i = 0, j = 0;
while (j < A.length) {
    if (findMin(A,i,j) * 2 > findMax(A,i,j)) {
        // success -- now let's look for a longer subarray:
        maxLength = j - i + 1;
        ++j;
    } else {
        // failure -- keep looking for a subarray this length:
        ++i;
        ++j;
    }
}

or, if you prefer:

int maxLength = 0;
int i = 0;
for (int j = 0; j < A.length; ++j) {
    if (findMin(A,i,j) * 2 > findMax(A,i,j)) {
        // success -- now let's look for a longer subarray:
        maxLength = j - i + 1;
    } else {
        // failure -- keep looking for a subarray this length:
        ++i;
    }
}

Since in your solution, the inner loop iterates a total of O(n2) times, and you've stated that your solution runs in O(n2) time, we could argue that, since the above has the inner loop iterate only O(n) times, the above must run in O(n) time.

The problem is, that premise is really very questionable; you haven't indicated how you would implement findMin and findMax, but the straightforward implementation would take O(ji) time, such that your solution actually runs in O(n3) rather than O(n2). So if we reduce the number of inner loop iterations from O(n2) to O(n), that just brings the total time complexity down from O(n3) to O(n2).

But, as it happens, it is possible to calculate the min and max of these subarrays in amortized O(1) time and O(n) extra space, using "Method 3" at https://www.geeksforgeeks.org/sliding-window-maximum-maximum-of-all-subarrays-of-size-k/. (Hat-tip to גלעד ברקן for pointing this out.) The way it works is, you maintain two deques, minseq for calculating min and maxseq for calculating max. (I'll only explain minseq; maxseq is analogous.) At any given time, the first element (head) of minseq is the index of the min element in A[ij]; the second element of minseq is the index of the min element after the first element; and so on. (So, for example, if the subarray is [80,10,30,60,50] starting at index #2, then minseq will be [3,4,6], those being the indices of the subsequence [10,30,50].) Whenever you increment i, you check if the old value of i is the head of minseq (meaning that it's the current min); if so, you remove the head. Whenever you increment j, you repeatedly check if the tail of minseq is the index of an element that's greater or equal to the element at j; if so, you remove the tail and repeat. Once you've removed all such tail elements, you add j to the tail. Since each index is added to and removed from the deque at most once, this bookkeeping has a total cost of O(n).

That gives you overall O(n) time, as desired.

0
votes

There's a simple O(n log n) time and O(n) space solution since we know the length of the window is bound, which is to binary search for the window size. For each chosen window size, we iterate over the array once, and we make O(log n) such traversals. If the window is too large, we won't find a solution and try a window half the size; otherwise we try a window halfway between this and the last successful window size. (To update the min and max in the sliding window we can use method 3 described here.)

0
votes

Here's an algorithm in O(n lg k) time, where n is the length of the array and k the length of the maxmimum subarray having 2 * min > max.

Let A the array. Let's start with the following invariant: for j between 0 and length A, SA(j) is empty or 2 * min > max. It is extremely easy to initialize: take the empty subarray from indices 0 to 0. (Note that SA(j) may be empty because A[j] may be zero or negative: you don't have 2 * min > max because min >= 2 * min > max is impossible.)

The algorithm is: for each j, we set SA(j) = SA(j-1) + A[j]. But if A[j] >= 2 * min(SA(j-1)), then the invariant is broken. To restore the invariant, we have to remove all the elements e from SA(j) that meet A[j] >= 2 * e. In the same way, the invariant is broken if 2 * A[j] <= max(SA(j-1)). To restore the invariant, we have to remove all the elements e from SA(j) that meet 2 * A[j] <= e.

On the fly, we keep a track of the longest SA(j) found and return it.

Hence the algorithm:

SA(0) <- A[0..1] # 1 excluded -> empty subarray
ret <- SA(0)
for j in 1..length(A):
    if A[j] >= 2 * min(SA(j-1)):
        i <- the last index having A[j] >= 2 * A[i]
        SA(j) <- A[i+1..j+1]
    else if 2 * A[j] <= max(SA(j-1)):
        i <- the last index having 2 * A[j] <= A[i]
        SA(j) <- A[i+1..j+1]
    if length(SA(j)) > length(ret):
        ret <- SA(j)

return ret

The question is: how do we find the last index i having A[j] >= 2 * A[i]? If we iterate over SA(j-1), we need k steps at most, and then the time complexity will be O(n k) (we start with j-1 and look for the last value that keeps the invariant).

But there is a better solution. Imagine we have a min heap that stores elements of SA(j-1) along with their positions. The first element is the minimum of SA(j-1), let i0 be its index. We can remove all elements from the start of SA(j-1) to i0 included. Now, are we sure that A[j] >= 2 * A[i] for all remaining is? No: there is maybe more elements that are to small. Hence we remove the elements one after the other until the invariant is restored.

We'll need a max heap to, to handle the other situation 2 * A[j] <= max(SA(j-1)).

The easier is to create an ad hoc queue that has the following operations:

  • add(v): add an element v to the queue
  • remove_until_min_gt(v): remove elements from start of the queue until the minimum is greater than v
  • remove_until_max_lt(v): remove elements from start of the queue until the maximum is less than v
  • maximum: get the maximum of the queue
  • minimum: get the minimum of the queue

With two heaps, maximum and minimum are O(1), but the other operations are O(lg k).

Here is a Python implementation that keep indices of the start and the en of the queue:

import heapq

class Queue:
    def __init__(self):
        self._i = 0 # start in A
        self._j = 0 # end in A
        self._minheap = []
        self._maxheap = []

    def add(self, value):
        # store the value and the indices in both heaps
        heapq.heappush(self._minheap, (value, self._j))
        heapq.heappush(self._maxheap, (-value, self._j))
        # update the index in A
        self._j += 1

    def remove_until_min_gt(self, v):
        return self._remove_until(self._minheap, lambda x: x > v)

    def remove_until_max_lt(self, v):
        return self._remove_until(self._maxheap, lambda x: -x < v)

    def _remove_until(self, heap, check):
        while heap and not check(heap[0][0]):
            j = heapq.heappop(heap)[1]
            if self._i < j + 1:
                self._i = j + 1 # update the start index
        # remove front elements before the start index
        # there may remain elements before the start index in the heaps,
        # but the first element is after the start index.
        while self._minheap and self._minheap[0][1] < self._i:
            heapq.heappop(self._minheap)
        while self._maxheap and self._maxheap[0][1] < self._i:
            heapq.heappop(self._maxheap)

    def minimum(self):
        return self._minheap[0][0]

    def maximum(self):
        return -self._maxheap[0][0]

    def __repr__(self):
        ns = [v for v, _ in self._minheap]
        return f"Queue({ns})"

    def __len__(self):
        return self._j - self._i

    def from_to(self):
        return self._i, self._j

def find_min_twice_max_subarray(A):
    queue = Queue()
    best_len = 0
    best = (0, 0)
    for v in A:
        queue.add(v)
        if 2 * v <= queue.maximum():
            queue.remove_until_max_lt(v)
        elif v >= 2 * queue.minimum():
            queue.remove_until_min_gt(v/2)
        if len(queue) > best_len:
            best_len = len(queue)
            best = queue.from_to()

    return best

You can see that every element of A except the last one may pass through this queue, thus the O(n lg k) time complexity.

Here's a test.

import random
A = [random.randint(-10, 20) for _ in range(25)]
print(A)
# [18, -4, 14, -9, 8, -6, 12, 13, -7, 7, -2, 14, 7, 9, -9, 9, 20, 19, 14, 13, 14, 14, 2, -8, -2]
print(A[slice(*find_min_twice_max_subarray(A))])
# [20, 19, 14, 13, 14, 14]

Obviously, if there was a way to find the start index that restores the invariant in O(1), we would have a time complexity in  O(1). (This reminds me how the KMP algorithm finds the best new start in a string matching problem, but I don't know if it is possible to create something similar here.)