33
votes

The common interview problem of determining the missing value in a range from 1 to N has been done a thousand times over. Variations include 2 missing values up to K missing values.

Example problem: Range [1,10] (1 2 4 5 7 8 9 10) = {3,6}

Here is an example of the various solutions:

Easy interview question got harder: given numbers 1..100, find the missing number(s)

My question is that seeing as the simple case of one missing value is of O(n) complexity and that the complexity of the larger cases converge at roughly something larger than O(nlogn):

Couldn't it just be easier to answer the question by saying sort (mergesort) the range and iterate over it observing the missing elements?

This solution should take no more than O(nlogn) and is capable of solving the problem for ranges other than 1-to-N such as 10-to-1000 or -100 to +100 etc...

Is there any reason to believe that the given solutions in the above SO link will be better than the sorting based solution for larger number of missing values?

Note: It seems a lot of the common solutions to this problem, assume an only number theoretic approach. If one is being asked such a question in an S/E interview wouldn't it be prudent to use a more computer science/algorithmic approach, assuming the approach is on par with the number theoretic solution's complexity...

More related links:

8
Sorting or using BitSets are perfectly valid solutions unless the interviewer explicitly specifies that he is looking for a streaming algorithm or that the set requires too much memory.abc
What if you don't even have O(N) memory available? What if you have to implement this on an embedded device with very limited resources, and the input comes in the form of a stream with no random access?gcbenison
The problem with this answer is the OP posted this in questions that specifically require O(K) space only; whereas this answer requires O(N) space. The OP characterized the other answers (some of which are quite good) as "ridiculous answers".WestCoastProjects
The problems make the most sense, when the input is given in a streaming sense: You can't actually store all n items in memory, but they are shown to you one at a time. You only have k^O(1) memory to play with. In this case the sum-of-powers technique makes sense. You can also improve the "time used per number shown", by hashing as in stackoverflow.com/a/36851791/205521Thomas Ahle
@javadba, the original question did ask for O(k) space complexity, but heap sort can manage the problem in O(1) space.Elliott

8 Answers

12
votes

You are only specifying the time complexity, but the space complexity is also important to consider.

The problem complexity can be specified in term of N (the length of the range) and K (the number of missing elements).

In the question you link, the solution of using equations is O(K) in space (or perhaps a bit more ?), as you need one equation per unknown value.

There is also the preservation point: may you alter the list of known elements ? In a number of cases this is undesirable, in which case any solution involving reordering the elements, or consuming them, must first make a copy, O(N-K) in space.

I cannot see faster than a linear solution: you need to read all known elements (N-K) and output all unknown elements (K). Therefore you cannot get better than O(N) in time.

Let us break down the solutions

  • Destroying, O(N) space, O(N log N) time: in-place sort
  • Preserving, O(K) space ?, O(N log N) time: equation system
  • Preserving, O(N) space, O(N) time: counting sort

Personally, though I find the equation system solution clever, I would probably use either of the sorting solutions. Let's face it: they are much simpler to code, especially the counting sort one!

And as far as time goes, in a real execution, I think the "counting sort" would beat all other solutions hands down.

Note: the counting sort does not require the range to be [0, X), any range will do, as any finite range can be transposed to the [0, X) form by a simple translation.

EDIT:

Changed the sort to O(N), one needs to have all the elements available to sort them.

Having had some time to think about the problem, I also have another solution to propose. As noted, when N grows (dramatically) the space required might explode. However, if K is small, then we could change our representation of the list, using intervals:

  • {4, 5, 3, 1, 7}

can be represented as

  • [1,1] U [3,5] U [7,7]

In the average case, maintaining a sorted list of intervals is much less costly than maintaining a sorted list of elements, and it's as easy to deduce the missing numbers too.

The time complexity is easy: O(N log N), after all it's basically an insertion sort.

Of course what's really interesting is that there is no need to actually store the list, thus you can feed it with a stream to the algorithm.

On the other hand, I have quite a hard time figuring out the average space complexity. The "final" space occupied is O(K) (at most K+1 intervals), but during the construction there will be much more missing intervals as we introduce the elements in no particular order.

The worst case is easy enough: N/2 intervals (think odd vs even numbers). I cannot however figure out the average case though. My gut feeling is telling me it should be better than O(N), but I am not that trusting.

1
votes

Whether the given solution is theoretically better than the sorting one depends on N and K. While your solution has complexity of O(N*log(N)), the given solution is O(N*K). I think that the given solution is (same as the sorting solution) able to solve any range [A, B] just by transforming the range [A, B] to [1, N].

1
votes

What about this?

  1. create your own set containing all the numbers
  2. remove the given set of numbers from your set (no need to sort)

What's left in your set are the missing numbers.

1
votes

My question is that seeing as the [...] cases converge at roughly something larger than O(nlogn) [...]

In 2011 (after you posted this question) Caf posted a simple answer that solves the problem in O(n) time and O(k) space [where the array size is n - k].

Importantly, unlike in other solutions, Caf's answer has no hidden memory requirements (using bit array's, adding numbers to elements, multiplying elements by -1 - these would all require O(log(n)) space).

Note: The question here (and the original question) didn't ask about the streaming version of the problem, and the answer here doesn't handle that case.


Regarding the other answers: I agree that many of the proposed "solutions" to this problem have dubious complexity claims, and if their time complexities aren't better in some way than either:

  • count sort (O(n) time and space)
  • compare (heap) sort (O(n*log(n)) time, O(1) space)

...then you may as well just solve the problem by sorting.

However, we can get better complexities (and more importantly, genuinely faster solutions):

0
votes

Because the numbers are taken from a small, finite range, they can be 'sorted' in linear time.

All we do is initialize an array of 100 booleans, and for each input, set the boolean corresponding to each number in the input, and then step through reporting the unset booleans.

0
votes

If there are total N elements where each number x is such that 1 <= x <= N then we can solve this in O(nlogn) time complexity and O(1) space complexity.

  1. First sort the array using quicksort or mergesort.
  2. Scan through the sorted array and if the difference between previously scanned number, a and current number, b is equal to 2 (b - a = 2), then the missing number is a+1. This can be extended to condition where (b - a > 2).

Time complexity is O(nlogn)+O(n) almost equal to O(nlogn) when N > 100.

-1
votes

If the range is given to you well ahead, in this case range is [1,10] you can perform XOR operation with your range and the numbers given to you. Since XOR is commutative operation. You will be left with {3,6}

(1 2 3 4 5 6 7 8 9 10) XOR (1 2 4 5 7 8 9 10) ={3,6}

-1
votes

I already answered it HERE

You can also create an array of boolean of the size last_element_in_the_existing_array + 1.

In a for loop mark all the element true that are present in the existing array.

In another for loop print the index of the elements which contains false AKA The missing ones.

Time Complexity: O(last_element_in_the_existing_array)

Space Complexity: O(array.length)