0
votes

Looking for a solution to the subset sum problem for this specific case (in C# or else algorithm):

1) There are about 1,000 numbers in set (might grow to few thousand)

2) Sums run into the billions

3) The numbers are currency values so have a precision of two decimal places (e.g. 2,345.17)

4) Numbers in set can be both positive and negative (so dealing with net sum)

I then need to repeat this search (with same set of numbers) but different sum, up to 1,000 times. And finally the whole process runs 1,000 times. So we're looking at 1,000,000 runs. The goal is to accomplish that in 2 minutes. That means each run should take no more than 0.12 ms.

Is this feasible?

-Krip

1
It's not really clear to me what you want. Can you give an example? Or a sample implementation(that might be slow) - CodesInChaos
"finally the whole process runs 1,000 times" --- so each process has a different set of numbers? - Jacob
Jacob, yes for the outer run of 1,000 times the set of numbers and sum are different - Krip
Do you need an exact solution or will an approximation suffice? I highly doubt that you will be able to find an exact efficient exact algorithm as the problem is NP-complete. - tskuzzy

1 Answers

0
votes

I'm going to assume you already know about the DP pseudo-poly algorithm, which is pretty much the only remotely (for optimal answers) tractable way to do this for 1,000 elements

The way the algorithm is usually implemented involves an array of size of the maximum sum, each representing a different bucket for the number at that index. To adapt this to decimals, you're going to need to transform your data from decimal to integer (by multiplying by 100). You could also implement this using a set data structure, which may be much easier and space efficient.

e.g.,

import copy
j = {0:1}
lst = [1,2,8,2.3,214]

for i in lst:
    newj = copy.copy(j)
    for k in j:
        newj[k+i]=1
    j = newj

Repeating the sub-set sum algorithm with a different sum shouldn't be an issue - if you follow the DP algorithm, you'll compute all the possible sums once and then you can recheck your set for the new sum each time.

The real issue is going the size of your set since it will grow as the algorithm progresses. In the worse pathological case, the size of the set will grow exponentially with the number of elements (every sum is unique, 2^n elements). If there is some overlap, you'll be better off. I'm guessing for 1000 elements though, with a large range, you might be in trouble.