9
votes

I would like to "chunk" a seq into subseqs the same as partition-by, except that the function is not applied to each individual element, but to a range of elements.

So, for example:

(gather (fn [a b] (> (- b a) 2)) 
        [1 4 5 8 9 10 15 20 21])

would result in:

[[1] [4 5] [8 9 10] [15] [20 21]]

Likewise:

(defn f [a b] (> (- b a) 2))
(gather f [1 2 3 4]) ;; => [[1 2 3] [4]]
(gather f [1 2 3 4 5 6 7 8 9]) ;; => [[1 2 3] [4 5 6] [7 8 9]]

The idea is that I apply the start of the list and the next element to the function, and if the function returns true we partition the current head of the list up to that point into a new partition.

I've written this:

(defn gather
  [pred? lst]
  (loop [acc [] cur [] l lst]
    (let [a (first cur)
          b (first l)
          nxt (conj cur b)
          rst (rest l)]
      (cond
       (empty? l) (conj acc cur)
       (empty? cur) (recur acc nxt rst)
       ((complement pred?) a b) (recur acc nxt rst)
       :else (recur (conj acc cur) [b] rst)))))

and it works, but I know there's a simpler way. My question is:

Is there a built in function to do this where this function would be unnecessary? If not, is there a more idiomatic (or simpler) solution that I have overlooked? Something combining reduce and take-while?

Thanks.

5
this is the kind of problem that when I see it, I also think of a long solution, but I know that a clojure expert would have it solved in about 2 lines of codejohnbakers

5 Answers

12
votes

Original interpretation of question

We (all) seemed to have misinterpreted your question as wanting to start a new partition whenever the predicate held for consecutive elements.

Yet another, lazy, built on partition-by

(defn partition-between [pred? coll] 
  (let [switch (reductions not= true (map pred? coll (rest coll)))] 
    (map (partial map first) (partition-by second (map list coll switch)))))

(partition-between (fn [a b] (> (- b a) 2)) [1 4 5 8 9 10 15 20 21])
;=> ((1) (4 5) (8 9 10) (15) (20 21))

Actual Question

The actual question asks us to start a new partition whenever pred? holds for the beginning of the current partition and the current element. For this we can just rip off partition-by with a few tweaks to its source.

(defn gather [pred? coll]
  (lazy-seq
   (when-let [s (seq coll)]
     (let [fst (first s)
           run (cons fst (take-while #((complement pred?) fst %) (next s)))]
       (cons run (gather pred? (seq (drop (count run) s))))))))

(gather (fn [a b] (> (- b a) 2)) [1 4 5 8 9 10 15 20 21])
;=> ((1) (4 5) (8 9 10) (15) (20 21))

(gather (fn [a b] (> (- b a) 2)) [1 2 3 4])
;=> ((1 2 3) (4))

(gather (fn [a b] (> (- b a) 2)) [1 2 3 4 5 6 7 8 9])
;=> ((1 2 3) (4 5 6) (7 8 9))
2
votes

Since you need to have the information from previous or next elements than the one you are currently deciding on, a partition of pairs with a reduce could do the trick in this case.

This is what I came up with after some iterations:

(defn gather [pred s]
  (->> (partition 2 1 (repeat nil) s) ; partition the sequence and if necessary
                                      ; fill the last partition with nils
    (reduce (fn [acc [x :as s]]
              (let [n   (dec (count acc))
                    acc (update-in acc [n] conj x)]
                (if (apply pred s)
                  (conj acc [])
                  acc)))
            [[]])))

(gather (fn [a b] (when (and a b) (> (- b a) 2)))
        [1 4 5 8 9 10 15 20 21])

;= [[1] [4 5] [8 9 10] [15] [20 21]]

The basic idea is to make partitions of the number of elements the predicate function takes, filling the last partition with nils if necessary. The function then reduces each partition by determining if the predicate is met, if so then the first element in the partition is added to the current group and a new group is created. Since the last partition could have been filled with nulls, the predicate has to be modified.

Tow possible improvements to this function would be to let the user:

  1. Define the value to fill the last partition, so the reducing function could check if any of the elements in the partition is this value.
  2. Specify the arity of the predicate, thus allowing to determine the grouping taking into account the current and the next n elements.
1
votes

I wrote this some time ago in useful:

(defn partition-between [split? coll]
  (lazy-seq
   (when-let [[x & more] (seq coll)]
     (lazy-loop [items [x], coll more]
       (if-let [[x & more] (seq coll)]
         (if (split? [(peek items) x])
           (cons items (lazy-recur [x] more))
           (lazy-recur (conj items x) more))
         [items])))))

It uses lazy-loop, which is just a way to write lazy-seq expressions that look like loop/recur, but I hope it's fairly clear.

I linked to a historical version of the function, because later I realized there's a more general function that you can use to implement partition-between, or partition-by, or indeed lots of other sequential functions. These days the implementation is much shorter, but it's less obvious what's going on if you're not familiar with the more general function I called glue:

(defn partition-between [split? coll]
  (glue conj []
        (fn [v x]
          (not (split? [(peek v) x])))
        (constantly false)
        coll))

Note that both of these solutions are lazy, which at the time I'm writing this is not true of any of the other solutions in this thread.

0
votes

Here is one way, with steps split up. It can be narrowed down to fewer statements.

(def l [1 4 5 8 9 10 15 20 21])

(defn reduce_fn [f x y]
  (cond
   (f (last (last x)) y) (conj x [y])
   :else (conj (vec (butlast x)) (conj (last x) y)) )
  )

 (def reduce_fn1 (partial reduce_fn #(> (- %2 %1) 2)))

 (reduce reduce_fn1 [[(first l)]] (rest l))
0
votes

keep-indexed is a wonderful function. Given a function f and a vector lst,

(keep-indexed (fn [idx it] (if (apply f it) idx))
       (partition 2 1 lst)))

(0 2 5 6)

this returns the indices after which you want to split. Let's increment them and tack a 0 at the front:

(cons 0 (map inc (.....)))

(0 1 3 6 7)

Partition these to get ranges:

(partition 2 1 nil (....))

((0 1) (1 3) (3 6) (6 7) (7))

Now use these to generate subvecs:

(map (partial apply subvec lst) ....)

([1] [4 5] [8 9 10] [15] [20 21])

Putting it all together:

(defn gather
  [f lst]
  (let [indices (cons 0 (map inc
                   (keep-indexed (fn [idx it]
                                    (if (apply f it) idx))
                       (partition 2 1 lst))))]
    (map (partial apply subvec (vec lst))
       (partition 2 1 nil indices))))

(gather #(> (- %2 %) 2) '(1 4 5 8 9 10 15 20 21))
([1] [4 5] [8 9 10] [15] [20 21])