Google Combinatorial Optimization interview problem

12

votes

I got asked this question on a interview for Google a couple of weeks ago, I didn't quite get the answer and I was wondering if anyone here could help me out.

You have an array with n elements. The elements are either 0 or 1. You want to split the array into k contiguous subarrays. The size of each subarray can vary between ceil(n/2k) and floor(3n/2k). You can assume that k << n. After you split the array into k subarrays. One element of each subarray will be randomly selected.

Devise an algorithm for maximizing the sum of the randomly selected elements from the k subarrays. Basically means that we will want to split the array in such way such that the sum of all the expected values for the elements selected from each subarray is maximum.

You can assume that n is a power of 2.

Example:

Array: [0,0,1,1,0,0,1,1,0,1,1,0]
n = 12
k = 3
Size of subarrays can be: 2,3,4,5,6

Possible subarrays [0,0,1] [1,0,0,1] [1,0,1,1,0]
Expected Value of the sum of the elements randomly selected from the subarrays: 1/3 + 2/4 + 3/5 = 43/30 ~ 1.4333333 

Optimal split: [0,0,1,1,0,0][1,1][0,1,1,0]
Expected value of optimal split: 1/3 + 1 + 1/2 = 11/6 ~ 1.83333333

algorithmmath

Wow I hope the interviewer explained it better than that, otherwise I never want to interview for Google. – Mike Christensen

It was actually pretty close to that. – John Smith

Yeah I think that is the optimal partition. – John Smith

@hatchet: ah, that's more interesting (and a horrible abuse of the term "partition" to my ears as a mathematician). – Stephen Canon

what about [0,0],[1,1],[0,0],[1,1],[0,1],[1,0] - it gives a 0+1+0+1+0.5+0.5 maybe we need to also multiply the percentage by the number of members in the group? – asafrob

6

votes

I think we can solve this problem using dynamic programming.

Basically, we have:

f(i,j) is defined as the maximum sum of all expected values chosen from an array of size i and split into j subarrays. Therefore the solution should be f(n,k).

The recursive equation is:

f(i,j) = f(i-x,j-1) + sum(i-x+1,i)/x where (n/2k) <= x <= (3n/2k)

3

votes

I don't know if this is still an open question or not, but it seems like the OP has managed to add enough clarifications that this should be straightforward to solve. At any rate, if I am understanding what you are saying this seems like a fair thing to ask in an interview environment for a software development position.

Here is the basic O(n^2 * k) solution, which should be adequate for small k (as the interviewer specified):

def best_val(arr, K):
  n = len(arr)
  psum = [ 0.0 ]
  for x in arr:
    psum.append(psum[-1] + x)
  tab = [ -100000 for i in range(n) ]
  tab.append(0)
  for k in range(K):
    for s in range(n - (k+1) * ceil(n/(2*K))):
      terms = range(s + ceil(n/(2*K)), min(s + floor((3*n)/(2*K)) + 1, n+1))
      tab[s] = max( [ (psum[t] - psum[s]) / (t - s) + tab[t] for t in terms ])
  return tab[0]

I used the numpy ceil/floor functions but you basically get the idea. The only `tricks' in this version is that it does windowing to reduce the memory overhead to just O(n) instead of O(n * k), and that it precalculates the partial sums to make computing the expected value for a box a constant time operation (thus saving a factor of O(n) from the inner loop).

1

votes

I don't know if anyone is still interested to see the solution for this problem. Just stumbled upon this question half an hour ago and thought of posting my solution(Java). The complexity for this is O(n*K^log10). The proof is a little convoluted so I would rather provide runtime numbers:

n k time(ms)
48 4 25
48 8 265
24 4 20
24 8 33
96 4 51
192 4 143
192 8 343919

The solution is the same old recursive one where given an array, choose the first partition of size ceil(n/2k) and find the best solution recursively for the rest with number of partitions = k -1, then take ceil(n/2k) + 1 and so on.

Code:

public class PartitionOptimization {
public static void main(String[] args) {
    PartitionOptimization p = new PartitionOptimization();
    int[] input = { 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0};
    int splitNum = 3;
    int lowerLim = (int) Math.ceil(input.length / (2.0 * splitNum));        
    int upperLim = (int) Math.floor((3.0 * input.length) / (2.0 * splitNum));
    System.out.println(input.length + " " + lowerLim + " " + upperLim + " " +
            splitNum);
    Date currDate = new Date();
    System.out.println(currDate);       
    System.out.println(p.getMaxPartExpt(input, lowerLim, upperLim,
            splitNum, 0));
    System.out.println(new Date().getTime() - currDate.getTime());
}

public double getMaxPartExpt(int[] input, int lowerLim, int upperLim,
        int splitNum, int startIndex) {
    if (splitNum <= 1 && startIndex<=(input.length -lowerLim+1)){
        double expt = findExpectation(input, startIndex, input.length-1);           
        return expt;
    }
    if (!((input.length - startIndex) / lowerLim >= splitNum))
        return -1;
    double maxExpt = 0;
    double curMax = 0;
    int bestI=0;
    for (int i = startIndex + lowerLim - 1; i < Math.min(startIndex
            + upperLim, input.length); i++) {
        double curExpect = findExpectation(input, startIndex, i);           
        double splitExpect = getMaxPartExpt(input, lowerLim, upperLim,
                splitNum - 1, i + 1);
        if (splitExpect>=0 && (curExpect + splitExpect > maxExpt)){
            bestI = i;
            curMax = curExpect;
            maxExpt = curExpect + splitExpect;
        }
    }
    return maxExpt;
}

public double findExpectation(int[] input, int startIndex, int endIndex) {
    double expectation = 0;
    for (int i = startIndex; i <= endIndex; i++) {
        expectation = expectation + input[i];
    }
    expectation = (expectation / (endIndex - startIndex + 1));
    return expectation;
}
 }

0

votes

Not sure I understand, the algorithm is to split the array in groups, right? The maximum value the sum can have is the number of ones. So split the array in "n" groups of 1 element each and the addition will be the maximum value possible. But it must be something else and I did not understand the problem, that seems too silly.

0

votes

I think this can be solved with dynamic programming. At each possible split location, get the maximum sum if you split at that location and if you don't split at that point. A recursive function and a table to store history might be useful.

sum_i = max{ NumOnesNewPart/NumZerosNewPart * sum(NewPart) + sum(A_i+1, A_end),
                sum(A_0,A_i+1) + sum(A_i+1, A_end)
           }

This might lead to something...

0

votes

I think its a bad interview question, but it is also an easy problem to solve.

Every integer contributes to the expected value with weight 1/s where s is the size of the set where it has been placed. Therefore, if you guess the sizes of the sets in your partition, you just need to fill the sets with ones starting from the smallest set, and then fill the remaining largest set with zeroes.

You can easily see then that if you have a partition, filled as above, where the sizes of the sets are S_1, ..., S_k and you do a transformation where you remove one item from set S_i and move it to set S_i+1, you have the following cases:

Both S_i and S_i+1 were filled with ones; then the expected value does not change
Both them were filled with zeroes; then the expected value does not change
S_i contained both 1's and 0's and S_i+1 contains only zeroes; moving 0 to S_i+1 increases the expected value because the expected value of S_i increases
S_i contained 1's and S_i+1 contains both 1's and 0's; moving 1 to S_i+1 increases the expected value because the expected value of S_i+1 increases and S_i remains intact

In all these cases, you can shift an element from S_i to S_i+1, maintaining the filling rule of filling smallest sets with 1's, so that the expected value increases. This leads to the simple algorithm:

Create a partitioning where there is a maximal number of maximum-size arrays and maximal number of minimum-size arrays
Fill the arrays starting from smallest one with 1's
Fill the remaining slots with 0's

0

votes

How about a recursive function:

int BestValue(Array A, int numSplits)
// Returns the best value that would be obtained by splitting 
// into numSplits partitions.

This in turn uses a helper:

// The additional argument is an array of the valid split sizes which 
// is the same for each call.
int BestValueHelper(Array A, int numSplits, Array splitSizes)
{
    int result = 0;
    for splitSize in splitSizes
        int splitResult = ExpectedValue(A, 0, splitSize) + 
                          BestValueHelper(A+splitSize, numSplits-1, splitSizes);
        if splitResult > result
            result = splitResult;
}

ExpectedValue(Array A, int l, int m) computes the expected value of a split of A that goes from l to m i.e. (A[l] + A[l+1] + ... A[m]) / (m-l+1).

BestValue calls BestValueHelper after computing the array of valid split sizes between ceil(n/2k) and floor(3n/2k).

I have omitted error handling and some end conditions but those should not be too difficult to add.

0

votes

Let

a[] = given array of length n
from = inclusive index of array a
k = number of required splits
minSize = minimum size of a split
maxSize = maximum size of a split
d = maxSize - minSize

expectation(a, from, to) = average of all element of array a from "from" to "to"

Optimal(a[], from, k) = MAX[ for(j>=minSize-1 to <=maxSize-1) { expectation(a, from, from+j) + Optimal(a, j+1, k-1)} ]

Runtime (assuming memoization or dp) = O(n*k*d)

Google Combinatorial Optimization interview problem

8 Answers