12
votes

I wrote this function that does this (easier to show than explain):

(split 2 (list 1 2 3 4 5 6))

=> ((1 2) (2 3) (3 4) (4 5) (5 6))

(defn split [n xs] 
  (if (> (count xs) (dec n))
      (cons (take n xs) (split n (rest xs)))
      '()))

I understand that in Clojure the list is not the only first class data structure. Would it make sense to write this data structure-agnostic? And regardless, is my implementation the most efficient and if not, how would I make it more efficient and/or idiomatic?

Thanks!

4

4 Answers

21
votes

You can use the built in partition function,

(partition 2 1 (list 1 2 3 4 5 6))
=> ((1 2) (2 3) (3 4) (4 5) (5 6))

works for any sequence.


clojure.core/partition
([n coll] [n step coll] [n step pad coll])
  Returns a lazy sequence of lists of n items each, at offsets step
  apart. If step is not supplied, defaults to n, i.e. the partitions
  do not overlap. If a pad collection is supplied, use its elements as
  necessary to complete last partition upto n items. In case there are
  not enough padding elements, return a partition with less than n items.

5
votes

No need to write your own implementation. Clojure provides partition, which is lazy. Also no need to use list, if you use just Number literals:

 (partition 2 '(1 2 3 4 5 6)) 
5
votes

You can create a lazy sequence out of your version:

  (defn split [n xs]
     (lazy-seq
         (let [m (take n xs)]
           (if (= n (count m))
             (cons m (split n (rest xs)))))))

(the reason for the different condition than your '(if (> (count xs) (dec n))' is because its more efficient to count M elements out of XS instead of counting the entire XS collection every time (which is kind of against the lazyness, because we dont want to walk the entire collection)

Imagine what it would had been like counting the elements in the monstrous range every iteration:)

  (take 10 (split 2 (range 100000000000)))

    => ((0 1) (1 2) (2 3)...)
2
votes

I've been using Clojure for about a month, so I'm probably not qualified to appoint the most idiomatic way ;)

But your implementation is short and to the point (ignoring that it also duplicates the built-in function partition as already mentioned).

The implementation is already fairly data structure agnostic - since it uses sequence operations, it works with all the standard data structures:

(split 2 [1 2 3 4 5 6])
=> ((1 2) (2 3) (3 4) (4 5) (5 6))

(split 2 #{1 2 3 4 5 6})
=> ((1 2) (2 3) (3 4) (4 5) (5 6))

(split 2 {1 :a 2 :b 3 :c 4 :d})
=> (([1 :a] [2 :b]) ([2 :b] [3 :c]) ([3 :c] [4 :d]))

(split 2 "abcd")
=> ((\a \b) (\b \c) (\c \d))

The primary limitation of using plain recursion is that you are limited by the size of the stack:

(split 2 (range 10000))
=> java.lang.StackOverflowError

So if you are expecting input sizes much above 1k, it's better to use loop/recur, which doesn't use the stack:

(defn split-loop [n coll]
  (loop [elms coll res [] ]
    (if (< (count elms) n)
      res
      (recur (next elms) (conj res (take n elms))))))