Find the kth lowest sum of unique pair in an array

Question

Given an array of numbers, each number represents the difficulty of a problem. People standing in a line should choose any two problems to solve. The two chosen problems should be different and the pair of problems should not be picked by anyone previously. Since they know the difficulties they'll choose the pair whose sum of difficulties is minimum.

Find the minimum sum of difficulties for the person whose standing in kth position in the line. i.e kth minimum sum of unique pair from an array.

Approach 1: Brute force approach(O(n²)) to calculate all possible unique sum and store that in an array and sorted the unique sum array to get the kth element.

Approach 2: Sort the difficulties array and choose the minimal elements(for first 4 elements we can have 6 unique pairs. so if k is less than or equal to 6 we can use first 4 elements in the sorted array to find the minimum sum) and did the approach 1 with the minimal array.

These 2 approaches did not solve the timeout cases. Need a solution with improved time efficiency.

Note: Different problem can have same difficulty level(i.e. array can contain duplicate numbers) also and not in sorted order by default.

difficulties = [1,4,3,2,4]

Person comes first chooses: 1+2 = 3
2nd person: 1+3 = 4
3rd person: 1+4 (or) 1+4(since difficulty of two problems are 4) (or) 2+3 = 5
4th person: 2+3 (or) 1+4(based on the previous selection) = 5

Final answer needed is only the minimum sum not the actual elements.

Assume the constraints to be:
2 <= N <= 10⁵
1 <= k <= N*(N-1)/2
1 <= difficulties[i] <= 10⁹

where,
N is the length of the array
k is the position in which the person has to choose the problems

Just to be clear. If we have an array of 5 elements, say A B C D E, then the pairs to choose from are A+B, A+C, A+D, A+E, B+C, B+D, B+E, C+D, C+E, and D+E. So we could have up to 10 people in the line. Is that correct? — user3386109
Could you give me some problem link so that I can try submitting before answering here? Also, for the input array in your example difficulties = [1,4,3,2,4] and the statement "The two problems should be different", means that I cannot make two pair of (1+4) and (1+4)? You count them as different using their actual elements or their indices? — risingStark
@user3386109 K can be and most probably will be greater than N — Manoj kumar

risingStark risingStark · Accepted Answer · 2021-03-16T11:14:12

Assuming k <= n*(n-1)/2. If not, then no answer possible.
We can use binary search to solve the problem. We binary search on the possible sum of pairs.
Here,
low = minimum sum possible i.e. low = difficulties[0] + difficulties[1], and
high = maximum sum possible i.e. high = difficulties[n-1] + difficulties[n-2].

So, mid = low + (high - low)/2
Now, in 1 iteration of binary search we would count the pairs of indices (i, j), i < j such that difficulties[i] + difficulties[j] <= mid. If the count is less than k, low = mid + 1 else if count >= k, high = mid. Now, this one iteration can be done in O(NlogN).

You can do this till (high - low) > 1. So, each time you reduce your search space by half. So, total time complexity would be O(N*logN*logMaxsum) which for N <= 1e6 and difficulties[i] <= 1e18 would run in less than 1s.

Now high can be equal to low or high can be equal to low +1. The answer can be equal to low or high. Now, you just need to solve the problem whether low is a possible sum(can be solved easily in O(N) using Hashing) and no. of pairs of indices (i, j), i < j such that difficulties[i] + difficulties[j] <= low. If both conditions satify then this is your answer. If not then high is the answer.

Running an example testcase:
Lets' consider the initial array, difficulties = [1, 4, 3, 2, 4] and k = 6.
You first sort the array costing us O(NlogN). After sorting difficulties = [1, 2, 3, 4, 4]
All the pairs n*(n-1)/2 = 10 would be:

(1 + 2) => 3
(1 + 3) => 4
(1 + 4) => 5
(1 + 4) => 5
(2 + 3) => 5
(2 + 4) => 6
(2 + 4) => 6
(3 + 4) => 7
(3 + 4) => 7
(4 + 4) => 8

This is more of a pseudocode to understand the running of the logic.

sort(difficulties)
low = difficulties[0] + difficulties[1] // Minimum possible sum
high = difficulties[n-1] + difficulties[n-2] // Maximum possible sum

while(high - low > 1){
    mid = low + (high - low)/2
    count = all pairs (i, j) and i < j such that difficulties[i] + difficulties[j] <= mid.
    if(count < k){
        low = mid +1 
    }else{
        high = mid
    }
}

Iteration 1:
low = 3
high = 8
mid = 5
count = 5 [(1 + 2), (1 + 3), (1 + 4), (1 + 4), (2 + 3)]
count < k, so low = mid + 1 = 6

----------

Iteration 2:
low = 6
high = 8
mid = 7
count = 9 [(1 + 2), (1 + 3), (1 + 4), (1 + 4), (2 + 3), (2 + 4), (2 + 4), (3 + 4), (3 + 4)]
count >= k, so high= mid = 7

Now, while loop stops since high(7) - low(6) = 1.
Now, you need to check if sum 6 is possible and count of all (i, j) >= k. if it is then low is the answer and in this case it is true. So, answer = 6 for k = 6.

To implement the count thing, you can again do a binary search. Choose first index as i then you just need to find the upper bound of mid - difficulties[i] in the array [i+1, n-1]. Then increment i by 1 and repeat the same. So, you go over every index 0 <= i <= n-1 and find its upper bound in the array search space of [i+1, n-1] and this each iteration takes O(NlogN).

To see why is the last step of checking if low or high is a possible sum or not, try running the algorithm for the array difficulties = [10, 40, 30, 20, 40].

UPDATE:
Below is the complete working code with the time complexity of O(N*logN*logMaxsum) including comments for clear understanding of the logic.

#include<bits/stdc++.h>
#define ll long long int

using namespace std;

void solve();
int main(){
    solve();
    return 0;
}

map<int, int> m;
vector<ll> difficulties;
ll countFunction(ll sum){
    /*
    Function to count all the pairs of indices (i, j) such that
    i < j and  (difficulties[i] + difficulties[j]) <= sum
    */
    ll count = 0;

    int n = (int)difficulties.size();
    for(int i=0;i<n-1;i++){
        /*
        Here the outer for loop means that if I choose difficulties[i]
        as the first element of the pair, then the remaining sum is
        m - difficulties[i], so we just need to find the upper_bound of this value
        to find the count of all pairs with sum <= m.
        upper_bound is an in-built function in C++ STL.
        */
        int x= upper_bound(difficulties.begin(), difficulties.end(), sum-difficulties[i]) - (difficulties.begin() + i + 1);
        if(x<=0){
            /*
            We break here because the condition of i < j is violated
            and it will be violated for remaining values of i as well.
            */
            break;
        }
        //cout<<"x = "<<x<<endl;
        count += x;
    }
    return count;
}


bool isPossible(ll sum){
    /*
    Hashing based solution to check if atleast 1 pair with
    a particular exists in the difficultiesay.
    */
    int n = (int) difficulties.size();
    for(int i=0;i<n;i++){
        /*
        Choosing the ith element as first element of pair
        and checking if there exists an element with value = sum - difficulties[i]
        */
        if(difficulties[i] == (sum - difficulties[i])){
            // If the elements are equal then the frequency must be > 1
            if(m[difficulties[i]] > 1){
                return true;
            }
        }else{
            if(m[sum - difficulties[i]] > 0){
                return true;
            }
        }
    }
    return false;
}

void solve(){
    ll i, j, n, k;
    cin>>n>>k;
    difficulties.resize(n);
    m.clear(); // to run multiple test-cases
    for(i=0;i<n;i++){
        cin>>difficulties[i];
        m[difficulties[i]]++;
    }
    sort(difficulties.begin(), difficulties.end());


    // Using binary search on the possible values of sum.
    ll low = difficulties[0] + difficulties[1]; // Lowest possible sum after sorting
    ll high = difficulties[n-1] + difficulties[n-2]; // Highest possible sum after sorting
    while((high-low)>1){
        ll mid = low + (high - low)/2;
        ll count = countFunction(mid);
        //cout<<"Low = "<<low<<" high = "<<high<<" mid = "<<mid<<" count = "<<count<<endl;
        if (k > count){
            low = mid + 1;
        }else{
            high = mid;
        }
    }

    /*
    Now the answer can be low or high and we need to check
    if low or high is a possible sum and does it satisfy the constraints of k.

    For low to be the answer, we need to count the number of pairs with sum <=low.
    If this count is >=k, then low is the answer.
    But we also need to check whether low is a feasible sum.
    */
    if(isPossible(low) && countFunction(low)>=k){
        cout<<low<<endl;
    }else{
        cout<<high<<endl;
    }
}

Find the kth lowest sum of unique pair in an array

1 Answers