Is there any disadvantage of using sequence-length, sequence-ref, sequence-map etc rather than different functions for lists (length list-ref etc), strings (string-length, string-ref etc), vectors etc in Racket?
1 Answers
Performance.
Consider this tiny benchmark:
#lang racket/base
(require racket/sequence)
(define len 10000)
(define vec (make-vector len))
(collect-garbage)
(collect-garbage)
(collect-garbage)
(time (void (for/list ([i (in-range len)])
(vector-ref vec i))))
(collect-garbage)
(collect-garbage)
(collect-garbage)
(time (void (for/list ([i (in-range len)])
(sequence-ref vec i))))
This is the output on my machine:
; vectors (vector-ref vs sequence-ref)
cpu time: 1 real time: 1 gc time: 0
cpu time: 2082 real time: 2081 gc time: 0
Yes, that’s a difference of 3 orders of magnitude.
Why? Well, racket/sequence
is not a terribly “smart” API, and even though vectors are random access, sequence-ref
is not. Combined with the ability of the Racket optimizer to heavily optimize primitive operations, the sequence API is a pretty poor interface.
Of course, this is a little unfair, because vectors are random access while things like lists are not. However, performing the exact same test as the one above but using lists instead of vectors still yields a pretty grim result:
; lists (list-ref vs sequence-ref)
cpu time: 113 real time: 113 gc time: 0
cpu time: 1733 real time: 1732 gc time: 0
The sequence API is slow, mostly because of a high level of indirection.
Now, performance alone is not a reason to reject an API outright, since there are concrete advantages to working at a higher level of abstraction. That said, I think the sequence API is not a good abstraction, because it:
…is needlessly stateful in its implementation, which puts an unnecessary burden on implementors of the interface.
…does not accommodate things that do not resemble lists, such as random-access vectors or hash tables.
If you want to work with a higher level API, one possible option is to use the collections
package, which attempts to provide an API similar to racket/sequence
, but accommodating more kinds of data structures and also having a more complete set of functions. Disclaimer: I am the author of the collections
package.
Given the above benchmark once more, the performance is still worse than using the underlying functions directly, but it’s at least a bit more manageable:
; vectors (vector-ref vs ref)
cpu time: 2 real time: 1 gc time: 0
cpu time: 97 real time: 98 gc time: 10
; lists (list-ref vs ref)
cpu time: 104 real time: 103 gc time: 0
cpu time: 481 real time: 482 gc time: 0
Whether or not you can afford the overhead depends on what exactly you’re doing, and it’s up to you to make the call for yourself. The specialized operations will always be at least somewhat faster than the ones that defer to them as long as some sort of dynamic dispatch is being performed. As always, remember the rule of performance optimization: don’t guess, measure.