5
votes

Consider this bit of Chez Scheme code:

(import (chezscheme))

(define (list-enumerate ls val proc)
  (let loop ((ls ls) (return? #f) (val val))
    (if (or (null? ls)
            return?)
        val
        (call-with-values (lambda () (proc val (car ls)))
          (lambda (return? val)
            (loop (cdr ls) return? val))))))

(define (list-index ls proc)
  (list-enumerate ls
                  0
                  (lambda (i elt)
                    (if (proc elt)
                        (values #t i)
                        (values #f (+ i 1))))))

(define n 100000)

(define data (iota n))

(time (list-index data (lambda (elt) (= elt (- n 1)))))

Run it:

~ $ scheme --script ~/scratch/_list-enumerate-allocation-test-chez-a.sps 
(time (list-index data ...))
    no collections
    3 ms elapsed cpu time
    4 ms elapsed real time
    8 bytes allocated

Wow, it reports that only 8 bytes were allocated.

Let's run it again using the --program option instead of --script:

~ $ scheme --program ~/scratch/_list-enumerate-allocation-test-chez-a.sps 
(time (list-index data ...))
    no collections
    3 ms elapsed cpu time
    3 ms elapsed real time
    800000 bytes allocated

Yikes, 800000 bytes allocated.

What's up with the difference?

Ed

1

1 Answers

7
votes

Here's a note from Kent Dybvig in response:


That's an interesting question.

When run with --script, which uses the REPL semantics, the variables defined in the script, like list-enumerate and list-index, are mutable, which inhibits interprocedural optimizations, including inlining. When run with --program, however, the variables are immutable, which allows interprocedural optimizations.

In this case, --program allows the compiler to inline list-enumerate into list-index's body and in turn the lambda expression within list-index's body into list-enumerate's body. The end result is a conditional expression within the call-with-values producer expression. This causes the compiler to create a closure for the consumer, to avoid code duplication along the then and else branches of the conditional. This closure is created each time through list-enumerate's loop, resulting in the extra allocation overhead. That's the way optimizations often go. Mostly you win, but sometimes you lose. The good news is, on balance, the benefits outweight he costs, even in your program. I put the call to list-index in a loop (the modified code is below) and found that that with --program, the code runs about 30% faster.

Kent


(import (chezscheme))

(define (list-enumerate ls val proc)
  (let loop ((ls ls) (return? #f) (val val))
    (if (or (null? ls)
            return?)
        val
        (call-with-values (lambda () (proc val (car ls)))
          (lambda (return? val)
            (loop (cdr ls) return? val))))))

(define (list-index ls proc)
  (list-enumerate ls
                  0
                  (lambda (i elt)
                    (if (proc elt)
                        (values #t i)
                        (values #f (+ i 1))))))

(define n 100000)

(define data (time (iota n)))

(let ()
(define runalot
  (lambda (i thunk)
    (let loop ([i i])
      (let ([x (thunk)])
        (if (fx= i 1)
            x
            (loop (fx- i 1)))))))

(time
  (runalot 1000
    (lambda ()
      (list-index data (lambda (elt) (= elt (- n 1))))))))