Here is a solution that doesn't rely on complex math as sdcvvc's/Dimitris Andreou's answers do, doesn't change the input array as caf and Colonel Panic did, and doesn't use the bitset of enormous size as Chris Lercher, JeremyP and many others did. Basically, I began with Svalorzen's/Gilad Deutch's idea for Q2, generalized it to the common case Qk and implemented in Java to prove that the algorithm works.
The idea
Suppose we have an arbitrary interval I of which we only know that it contains at least one of the missing numbers. After one pass through the input array, looking only at the numbers from I, we can obtain both the sum S and the quantity Q of missing numbers from I. We do this by simply decrementing I's length each time we encounter a number from I (for obtaining Q) and by decreasing pre-calculated sum of all numbers in I by that encountered number each time (for obtaining S).
Now we look at S and Q. If Q = 1, it means that then I contains only one of the missing numbers, and this number is clearly S. We mark I as finished (it is called "unambiguous" in the program) and leave it out from further consideration. On the other hand, if Q > 1, we can calculate the average A = S / Q of missing numbers contained in I. As all numbers are distinct, at least one of such numbers is strictly less than A and at least one is strictly greater than A. Now we split I in A into two smaller intervals each of which contains at least one missing number. Note that it doesn't matter to which of the intervals we assign A in case it is an integer.
We make the next array pass calculating S and Q for each of the intervals separately (but in the same pass) and after that mark intervals with Q = 1 and split intervals with Q > 1. We continue this process until there are no new "ambiguous" intervals, i.e. we have nothing to split because each interval contains exactly one missing number (and we always know this number because we know S). We start out from the sole "whole range" interval containing all possible numbers (like [1..N] in the question).
Time and space complexity analysis
The total number of passes p we need to make until the process stops is never greater than the missing numbers count k. The inequality p <= k can be proved rigorously. On the other hand, there is also an empirical upper bound p < log2N + 3 that is useful for large values of k. We need to make a binary search for each number of the input array to determine the interval to which it belongs. This adds the log k multiplier to the time complexity.
In total, the time complexity is O(N ᛫ min(k, log N) ᛫ log k). Note that for large k, this is significantly better than that of sdcvvc/Dimitris Andreou's method, which is O(N ᛫ k).
For its work, the algorithm requires O(k) additional space for storing at most k intervals, that is significantly better than O(N) in "bitset" solutions.
Java implementation
Here's a Java class that implements the above algorithm. It always returns a sorted array of missing numbers. Besides that, it doesn't require the missing numbers count k because it calculates it in the first pass. The whole range of numbers is given by the minNumber
and maxNumber
parameters (e.g. 1 and 100 for the first example in the question).
public class MissingNumbers {
private static class Interval {
boolean ambiguous = true;
final int begin;
int quantity;
long sum;
Interval(int begin, int end) { // begin inclusive, end exclusive
this.begin = begin;
quantity = end - begin;
sum = quantity * ((long)end - 1 + begin) / 2;
}
void exclude(int x) {
quantity--;
sum -= x;
}
}
public static int[] find(int minNumber, int maxNumber, NumberBag inputBag) {
Interval full = new Interval(minNumber, ++maxNumber);
for (inputBag.startOver(); inputBag.hasNext();)
full.exclude(inputBag.next());
int missingCount = full.quantity;
if (missingCount == 0)
return new int[0];
Interval[] intervals = new Interval[missingCount];
intervals[0] = full;
int[] dividers = new int[missingCount];
dividers[0] = minNumber;
int intervalCount = 1;
while (true) {
int oldCount = intervalCount;
for (int i = 0; i < oldCount; i++) {
Interval itv = intervals[i];
if (itv.ambiguous)
if (itv.quantity == 1) // number inside itv uniquely identified
itv.ambiguous = false;
else
intervalCount++; // itv will be split into two intervals
}
if (oldCount == intervalCount)
break;
int newIndex = intervalCount - 1;
int end = maxNumber;
for (int oldIndex = oldCount - 1; oldIndex >= 0; oldIndex--) {
// newIndex always >= oldIndex
Interval itv = intervals[oldIndex];
int begin = itv.begin;
if (itv.ambiguous) {
// split interval itv
// use floorDiv instead of / because input numbers can be negative
int mean = (int)Math.floorDiv(itv.sum, itv.quantity) + 1;
intervals[newIndex--] = new Interval(mean, end);
intervals[newIndex--] = new Interval(begin, mean);
} else
intervals[newIndex--] = itv;
end = begin;
}
for (int i = 0; i < intervalCount; i++)
dividers[i] = intervals[i].begin;
for (inputBag.startOver(); inputBag.hasNext();) {
int x = inputBag.next();
// find the interval to which x belongs
int i = java.util.Arrays.binarySearch(dividers, 0, intervalCount, x);
if (i < 0)
i = -i - 2;
Interval itv = intervals[i];
if (itv.ambiguous)
itv.exclude(x);
}
}
assert intervalCount == missingCount;
for (int i = 0; i < intervalCount; i++)
dividers[i] = (int)intervals[i].sum;
return dividers;
}
}
For fairness, this class receives input in form of NumberBag
objects. NumberBag
doesn't allow array modification and random access and also counts how many times the array was requested for sequential traversing. It is also more suitable for large array testing than Iterable<Integer>
because it avoids boxing of primitive int
values and allows wrapping a part of a large int[]
for a convenient test preparation. It is not hard to replace, if desired, NumberBag
by int[]
or Iterable<Integer>
type in the find
signature, by changing two for-loops in it into foreach ones.
import java.util.*;
public abstract class NumberBag {
private int passCount;
public void startOver() {
passCount++;
}
public final int getPassCount() {
return passCount;
}
public abstract boolean hasNext();
public abstract int next();
// A lightweight version of Iterable<Integer> to avoid boxing of int
public static NumberBag fromArray(int[] base, int fromIndex, int toIndex) {
return new NumberBag() {
int index = toIndex;
public void startOver() {
super.startOver();
index = fromIndex;
}
public boolean hasNext() {
return index < toIndex;
}
public int next() {
if (index >= toIndex)
throw new NoSuchElementException();
return base[index++];
}
};
}
public static NumberBag fromArray(int[] base) {
return fromArray(base, 0, base.length);
}
public static NumberBag fromIterable(Iterable<Integer> base) {
return new NumberBag() {
Iterator<Integer> it;
public void startOver() {
super.startOver();
it = base.iterator();
}
public boolean hasNext() {
return it.hasNext();
}
public int next() {
return it.next();
}
};
}
}
Tests
Simple examples demonstrating the usage of these classes are given below.
import java.util.*;
public class SimpleTest {
public static void main(String[] args) {
int[] input = { 7, 1, 4, 9, 6, 2 };
NumberBag bag = NumberBag.fromArray(input);
int[] output = MissingNumbers.find(1, 10, bag);
System.out.format("Input: %s%nMissing numbers: %s%nPass count: %d%n",
Arrays.toString(input), Arrays.toString(output), bag.getPassCount());
List<Integer> inputList = new ArrayList<>();
for (int i = 0; i < 10; i++)
inputList.add(2 * i);
Collections.shuffle(inputList);
bag = NumberBag.fromIterable(inputList);
output = MissingNumbers.find(0, 19, bag);
System.out.format("%nInput: %s%nMissing numbers: %s%nPass count: %d%n",
inputList, Arrays.toString(output), bag.getPassCount());
// Sieve of Eratosthenes
final int MAXN = 1_000;
List<Integer> nonPrimes = new ArrayList<>();
nonPrimes.add(1);
int[] primes;
int lastPrimeIndex = 0;
while (true) {
primes = MissingNumbers.find(1, MAXN, NumberBag.fromIterable(nonPrimes));
int p = primes[lastPrimeIndex]; // guaranteed to be prime
int q = p;
for (int i = lastPrimeIndex++; i < primes.length; i++) {
q = primes[i]; // not necessarily prime
int pq = p * q;
if (pq > MAXN)
break;
nonPrimes.add(pq);
}
if (q == p)
break;
}
System.out.format("%nSieve of Eratosthenes. %d primes up to %d found:%n",
primes.length, MAXN);
for (int i = 0; i < primes.length; i++)
System.out.format(" %4d%s", primes[i], (i % 10) < 9 ? "" : "\n");
}
}
Large array testing can be performed this way:
import java.util.*;
public class BatchTest {
private static final Random rand = new Random();
public static int MIN_NUMBER = 1;
private final int minNumber = MIN_NUMBER;
private final int numberCount;
private final int[] numbers;
private int missingCount;
public long finderTime;
public BatchTest(int numberCount) {
this.numberCount = numberCount;
numbers = new int[numberCount];
for (int i = 0; i < numberCount; i++)
numbers[i] = minNumber + i;
}
private int passBound() {
int mBound = missingCount > 0 ? missingCount : 1;
int nBound = 34 - Integer.numberOfLeadingZeros(numberCount - 1); // ceil(log_2(numberCount)) + 2
return Math.min(mBound, nBound);
}
private void error(String cause) {
throw new RuntimeException("Error on '" + missingCount + " from " + numberCount + "' test, " + cause);
}
// returns the number of times the input array was traversed in this test
public int makeTest(int missingCount) {
this.missingCount = missingCount;
// numbers array is reused when numberCount stays the same,
// just Fisher–Yates shuffle it for each test
for (int i = numberCount - 1; i > 0; i--) {
int j = rand.nextInt(i + 1);
if (i != j) {
int t = numbers[i];
numbers[i] = numbers[j];
numbers[j] = t;
}
}
final int bagSize = numberCount - missingCount;
NumberBag inputBag = NumberBag.fromArray(numbers, 0, bagSize);
finderTime -= System.nanoTime();
int[] found = MissingNumbers.find(minNumber, minNumber + numberCount - 1, inputBag);
finderTime += System.nanoTime();
if (inputBag.getPassCount() > passBound())
error("too many passes (" + inputBag.getPassCount() + " while only " + passBound() + " allowed)");
if (found.length != missingCount)
error("wrong result length");
int j = bagSize; // "missing" part beginning in numbers
Arrays.sort(numbers, bagSize, numberCount);
for (int i = 0; i < missingCount; i++)
if (found[i] != numbers[j++])
error("wrong result array, " + i + "-th element differs");
return inputBag.getPassCount();
}
public static void strideCheck(int numberCount, int minMissing, int maxMissing, int step, int repeats) {
BatchTest t = new BatchTest(numberCount);
System.out.println("╠═══════════════════════╬═════════════════╬═════════════════╣");
for (int missingCount = minMissing; missingCount <= maxMissing; missingCount += step) {
int minPass = Integer.MAX_VALUE;
int passSum = 0;
int maxPass = 0;
t.finderTime = 0;
for (int j = 1; j <= repeats; j++) {
int pCount = t.makeTest(missingCount);
if (pCount < minPass)
minPass = pCount;
passSum += pCount;
if (pCount > maxPass)
maxPass = pCount;
}
System.out.format("║ %9d %9d ║ %2d %5.2f %2d ║ %11.3f ║%n", missingCount, numberCount, minPass,
(double)passSum / repeats, maxPass, t.finderTime * 1e-6 / repeats);
}
}
public static void main(String[] args) {
System.out.println("╔═══════════════════════╦═════════════════╦═════════════════╗");
System.out.println("║ Number count ║ Passes ║ Average time ║");
System.out.println("║ missimg total ║ min avg max ║ per search (ms) ║");
long time = System.nanoTime();
strideCheck(100, 0, 100, 1, 20_000);
strideCheck(100_000, 2, 99_998, 1_282, 15);
MIN_NUMBER = -2_000_000_000;
strideCheck(300_000_000, 1, 10, 1, 1);
time = System.nanoTime() - time;
System.out.println("╚═══════════════════════╩═════════════════╩═════════════════╝");
System.out.format("%nSuccess. Total time: %.2f s.%n", time * 1e-9);
}
}
Try them out on Ideone
XOR
of all numbers from1
ton
, then xoring result with all numbers in the given array. In the end you have your missing number. In this solution you don't need to care about overflow as in summing up. – sbeliakov