3
votes

Having a function which returns a seq of characters, I need to modify it to allow attaching metadata to some characters (but not all). Clojure doesn't support 'with-meta' on primitive types. So, the possible options are:

  • return a seq of vectors of [character, metadata];

    pros: simplicity, data and metadata are tied together
    cons: need to extract data from vector
     
  • return two separate seqs, one for characters and one for metadata, caller most iterate those simultaneously if he cares about metadata;

    pros: caller is not forced to extract data from each stream element and may throw away meta-sequence if he wishes
    cons: need to iterate both seqs at once, more complexity on caller side if metadata is needed
     
  • introduce some record-wrapper containing one character and allowing to attach meta to itself (Clojure records implement IMeta);

    pros: data and metadata are tied together
    cons: need to extract data from record
     
  • your better option.

Which approach is better?

1

1 Answers

1
votes

Using vector/map sequence, e.g.

({:char 'x' :meta <...>} {:char 'y' :meta <...>} {:char 'z' :meta <...>} ...)
; or
(['x' <...>] ['y' <...>] ['z' <...>] ...)

looks like the best option for me, that's what I'd do myself if I had such task. Then, for example, writing a function which transforms such sequence back to sequence of chars is very simple:

(defn characters [s] (map :char s))
; or
(defn characters [s] (map first s))

Iterating through characters and metadata at the same time is also very easy using destructuring bindings:

(doseq [{:keys [char meta]} s] ...)
; or
(doseq [[char meta] s] ...)

What to use (map or vector) is mostly a matter of personal preference.

IMO, using records and their IMeta interface is not quite as good: I think that this kind of metadata is mostly intended for language-related code (macros, code preprocessing, syntax extensions etc) and not for domain code. Of course I may be wrong in this assumption.

And using two parallel sequences is the worst option because it is not as convenient for the user of your interface as single sequence. And throwing away the metadata is very simple with the function I wrote above, and it will not even have performance implications if all sequences are lazy.