3
votes

Clojure beginner here. Here's some code I'm trying to understand, from http://iloveponies.github.io/120-hour-epic-sax-marathon/sudoku.html (one page of a rather nice beginning Clojure course):


Subset sum is a classic problem. Here’s how it goes. You are given:

    a set of numbers, like #{1 2 10 5 7}
    and a number, say 23

and you want to know if there is some subset of the original set that sums up to the target. 
We’re going to solve this by brute force using a backtracking search.

Here’s one way to implement it:

(defn sum [a-seq]
  (reduce + a-seq))

(defn subset-sum-helper [a-set current-set target]
  (if (= (sum current-set) target)
    [current-set]
    (let [remaining (clojure.set/difference a-set current-set)]
      (for [elem remaining
            solution (subset-sum-helper a-set
                                        (conj current-set elem)
                                        target)]
        solution))))

(defn subset-sum [a-set target]
  (subset-sum-helper a-set #{} target))

So the main thing happens inside subset-sum-helper. First of all, always check if we have found 
a valid solution. Here it’s checked with

  (if (= (sum current-set) target)
    [current-set]

If we have found a valid solution, return it in a vector (We’ll see soon why in a vector). Okay, 
so if we’re not done yet, what are our options? Well, we need to try adding some element of 
a-set into current-set and try again. What are the possible elements for this? They are those 
that are not yet in current-set. Those are bound to the name remaining here:

    (let [remaining (clojure.set/difference a-set current-set)]

What’s left is to actually try calling subset-sum-helper with each new set obtainable 
in this way:

      (for [elem remaining
            solution (subset-sum-helper a-set
                                        (conj current-set elem)
                                        target)]
        solution))))

Here first elem gets bound to the elements of remaining one at a time. For each elem, 
solution gets bound to each element of the recursive call

            solution (subset-sum-helper a-set
                                        (conj current-set elem)
                                        target)]

And this is the reason we returned a vector in the base case, so that we can use for 
in this way.

And sure enough, (subset-sum #{1 2 3 4} 4) returns (#{1 3} #{1 3} #{4}).

But why must line 3 of subset-sum-helper return [current-set]? Wouldn't that return a final answer of ([#{1 3}] [#{1 3}] [#{4}])?

I try removing the enclosing brackets in line 3, making the function begin like this:

(defn subset-sum-helper [a-set current-set target]
  (if (= (sum current-set) target)
    current-set
    (let ...

Now (subset-sum #{1 2 3 4} 4) returns (1 3 1 3 4), which makes it look like let accumulates not the three sets #{1 3}, #{1 3}, and #{4}, but rather just the "bare" numbers, giving (1 3 1 3 4).

So subset-sum-helper is using the list comprehension for within a recursive calculation, and I don't understand what's happening. When I try visualizing this recursive calculation, I found myself asking, "So what happens when

(subset-sum-helper a-set
   (conj current-set elem)
   target)

doesn't return an answer because no answer is possible given its starting point?" (My best guess is that it returns [] or something similar.) I don't understand what the tutorial writer meant when he wrote, "And this is the reason we returned a vector in the base case, so that we can use for in this way."

I would greatly appreciate any help you could give me. Thanks!

3
I've updated my answer right nowtangrammer

3 Answers

3
votes

The subset-sum-helper function always returns a sequence of solutions. When the target is not met, the solution body at the end of the for expression enumerates such a sequence. When target is met, there is only one solution to return: the current-set argument. It must be returned as a sequence of one element. There are many ways to do this:

[current-set] ; as given - simplest
(list current-set)
(cons current-set ())
(conj () current-set)
...

If you cause an immediate return from subset-sum-helper (no recursion), you'll see the vector:

=> (subset-sum #{} 0)
[#{}]

Otherwise you'll see a sequence generated by for, which prints like a list:

=> (subset-sum (set (range 1 10)) 7)
(#{1 2 4}
 #{1 2 4}
 #{1 6}
 #{1 2 4}
 #{1 2 4}
 #{2 5}
 #{3 4}
 #{1 2 4}
 #{1 2 4}
 #{3 4}
 #{2 5}
 #{1 6}
 #{7})

When no answer is possible, subset-sum-helper returns an empty sequence:

=> (subset-sum-helper #{2 4 6} #{} 19)
()

Once again, this is printed as though it were a list.

The algorithm has problems:

  • It finds each solution many times - factorial of (count s) times for a solution s.
  • If an adopted element elem overshoots the target, it uselessly tries adding every permutation of the remaining set.

The code is easier to understand if we recast it somewhat.

The recursive call of subset-sum-helper passes the first and third arguments intact. If we use letfn to make this function local to subset-sum, we can do without these arguments: they are picked up from the context. It now looks like this:

(defn subset-sum [a-set target]
  (letfn [(subset-sum-helper [current-set]
             (if (= (reduce + current-set) target)
                 [current-set]
                 (let [remaining (clojure.set/difference a-set current-set)]
                         (for [elem remaining
                               solution (subset-sum-helper (conj current-set elem))]
                             solution))))]
      (subset-sum-helper #{})))

... where the single call to the sum function has been expanded inline.

It is now fairly clear that subset-sum-helper is returning the solutions that include its single current-set argument. The for expression is enumerating, for each element elem of a-set not in the current-set, the solutions containing the current set and the element. And it is doing this in succession for all such elements. So, starting with the empty set, which all solutions contain, it generates all of them.

2
votes

Maybe this explanation helps you:

Firstly we can experiment in a minimal code the expected behaviour (with and without brackets) of the for function but removing the recursion related code

With brackets:

(for [x #{1 2 3}
      y [#{x}]]
  y)
=> (#{1} #{2} #{3})

Without brackets:

(for [x #{1 2 3}
      y #{x}]
  y)
=> (1 2 3)

With brackets and more elements into the brackets*:**

(for [x #{1 2 3}
      y [#{x}  :a :b :c]]
  y)
=> (#{1} :a :b :c #{2} :a :b :c #{3} :a :b :c)

So you need (on this case) the brackets to avoid iterating over the set.

If we dont use the brackets we'll have "x" as binding value for y, and if we use the brackets we'll have #{x} as binding value for y.

In other words the code author needs a set and not iterating over a set as a binding value inside its for. So she put a set into a sequence "[#{x}]"

And summarising
"for" function takes a vector of one or more binding-form/collection-expr pairs So if your "collection-expre" is #{:a} the iteration result will be (:a) but if your "collection-expre" is [#{:a}] the iteration result will be (#{:a})

Sorry for the redundance on my explanations but it's difficult to be clear with these nuances

1
votes

Just for fun, here's a cleaner solution, still using for:

(defn subset-sum [s target]
  (cond
    (neg? target) ()
    (zero? target) (list #{})
    (empty? s) ()
    :else (let [f (first s), ns (next s)]
            (lazy-cat
              (for [xs (subset-sum ns (- target f))] (conj xs f))
              (subset-sum ns target)))))