3
votes

Question: The input is on a sequential file. The file contains at most 4Billion integers. Find a missing integer.

Solution as per my understanding:

  1. make two temporary files one with leading 0 and the other with leading 1

  2. one of the two MUST( 4.3B pigeon-holes and 4B pigeons) have less than 2B. Pick the file and repeat steps 1 & 2 on the 2nd bit and then on 3rd bit and so on..

what is the end condition of this iteration?

Also, the book mentions the efficiency of the algorithm being O(n) but,

1st iteration => n probe operations
2nd iteration => n/2 probe operations
.
.
.
n + n/2 + n/4 +... 1 => nlogn??

Am I missing something?

2
If you consider that an integer is always represented as 32-bit integer, then the 32 is assumed to be constant, and so it's O(32n) = O(n). Btw, there are discussion on this question: stackoverflow.com/questions/1642474/…justhalf
Is there only 1 missing integer? By that I mean that in a range from N..M, there will be only 1 value missing? If so, then that is indeed a O(n) operation, easy one at that. By N..M I mean that for instance, in the range from 10 through 20 (both inclusive), only 1 of the values from 10 through 20 is missing, not 2, not 3, not 0, only 1.Lasse V. Karlsen
n + n/2 + n/4 + ... + 1 <= 2nchill
The xor of all 32bit integers is zero (also for any other numbers of bits). So if you have (2^32)-1 distinct integers, the xor of all of them is the one that's missing. But it's not very clear from the question whether you have 4 billion or (2^32)-1 integers.harold
@harold "4.3B pigeon-holes and 4B pigeons" implies that we don't have 2^32 integers.Bernhard Barker

2 Answers

4
votes

You'll check both files and pick the one with the fewest elements.

You'll repeat the process until you've gone through all 32 bits and at the end you'll have a file 0 elements. This is where one of the missing numbers was supposed to be. So, if you've been keeping track of the bits you've filtered on thus far, you'll know what the number is supposed to be.

Note that this is to find a (i.e. 'any') missing number. If given an (unordered) sequential list of 4 billion (not 2^32 (4294967296)) integers with one missing, which you have to find, this won't work, as you can cut off the missing integer right in the beginning.

Also:

n + n/2 + n/4 + ... 1 <= 2n

Not n log n.

It's a geometric sequence with a = n, r = 1/2, which can be calculated with the formula:

n (1-(1/2)^m)
-------------
  1 - (1/2)

Since 0 < (1/2)^m < 1 for any positive number m (since 0 < 1/2 < 1), we can say (1-r^m) < 1 and thus we can say the maximum is:

  n.1
-------
1 - 1/2

   n
= ---
  1/2

= 2n
2
votes

If there is only 1 missing value, meaning that you have the following criteria:

  1. File contains all numbers ranging from a lowest value, N up to and including a highest value, M, except for 1 of those numbers.
  2. File does not have to be sorted
  3. There is only 1 of those values missing (just making sure)

Then the solution is quite simple:

ADD or XOR together all the numbers in the file.
ADD or XOR together all the numbers you're supposed to have.
The missing number is either one minus the other (in case of ADD) or one xor the other.

Here is a LINQPad program you can experiment with:

void Main()
{
    var input = new[] { 1, 2, 3, 4, 5, 6, 8, 9, 10 };

    var lowest = input[0];
    var highest = input[0];
    int xor = 0;
    foreach (var value in input)
    {
        lowest = Math.Min(lowest, value);
        highest = Math.Max(highest, value);
        xor ^= value;
    }
    int requiredXor = 0;
    for (int index = lowest; index <= highest; index++)
        requiredXor ^= index;

    var missing = xor ^ requiredXor;
    missing.Dump();
}

Basically, it will:

  1. XOR all values in the file together (value 1)
  2. Find the lowest and highest numbers at the same time
  3. XOR all values from lowest up to highest (value 2)
  4. XOR the two values (value 1 and value 2) together to find the missing value

This method will not detect if the missing value is the lowest value - 1 or highest value + 1, for instance, if the file is supposed to hold 1..10, but is missing 10 or 1, then the above approach will not find it.

This solution is O(2n) (we loop the numbers twice), which translates to O(n).

Here is a more complete example showing both the ADD and the XOR solution (again in LINQPad):

void Main()
{
    var input = new[] { 1, 2, 3, 4, 5, 6, 8, 9, 10 };
    MissingXOR(input).Dump("xor");
    MissingADD(input).Dump("add");
}

public static int MissingXOR(int[] input)
{
    var lowest = input[0];
    var highest = input[0];
    int xor = 0;
    foreach (var value in input)
    {
        lowest = Math.Min(lowest, value);
        highest = Math.Max(highest, value);
        xor ^= value;
    }
    int requiredXor = 0;
    for (int index = lowest; index <= highest; index++)
        requiredXor ^= index;

    return xor ^ requiredXor;
}

public static int MissingADD(int[] input)
{
    var lowest = input[0];
    var highest = input[0];
    int sum = 0;
    foreach (var value in input)
    {
        lowest = Math.Min(lowest, value);
        highest = Math.Max(highest, value);
        sum += value;
    }
    var sumToHighest = (highest * (highest + 1)) / 2;
    var sumToJustBelowLowest = (lowest * (lowest - 1)) / 2;
    int requiredSum =  sumToHighest - sumToJustBelowLowest;
    return requiredSum - sum;
}