2
votes

Often I hear that using a symbol table optimizes look ups of symbols in a programming language. Currently, my language is implemented only as an interpreter, not as a compiler. I do not yet want to allocate the time to build a compiler, so I'm attempting to optimize the interpreter. The language is based on Scheme semantics and syntax for the most part, and is statically-scoped. I use the AST for executing code at run-time (in my interpreter, implemented as discriminated unions just like the AST in Write Yourself a Scheme in 48 Hours.

Unfortunately, symbol look-up in my interpreter is slow due to the use of an F# Map to contain and look up symbols by name. (Well, in truth, it uses a Trie, but the performance is similarly problematic). I would like to instead use a symbol tree to achieve faster symbol lookup. However, I don't know if or how one can implement symbols tables in an interpreter. I hear about them only in the context of a compiler.

Is this possible? If the implementation strategy or performance differs from a symbol table in a compiler, could you describe the differences? Finally, is there an existing reference implementation of a symbol tree in an interpreter I might look at?

Thank you!

1
A symbol table is anything used to store symbols. It doesn't refer to a particular data structure (AFAIK).Daniel
Have you considered using a hash table? I realize you probably want persistance (presumably so each scope has its own table), but you can achieve this with a linked list of "scopes" each having a read-only dictionary of symbols. Symbol resolution would consist of starting with the innermost scope and traversing outwards, checking each symbol table along the way.Daniel
Danial - I think the use of a symbol table implies at least the property of constant-time look up. That's my guess from what literature I've read, at any rate.Bryan Edds
I think this is a bit like your last question: without knowing why Map is too slow and how fast is fast enough only general advice can be given, not precise remedies. A hash table would provide O(1) lookup per scope, so O(m), where m is the number of scopes, overall. I can't imagine it not being fast enough for an interpreter.Daniel
One thing that throws me a bit is that Jon Harrop specifically said to use symbol tables, and I presume he had a specific data structure in mind. However, I'm not sure if I made it clear that I need the optimization for an interpreter rather than a compiler. Sorry for these unclear questions - I'm groping around in the dark a bit. I'm hoping Jon pops in here to clarify.Bryan Edds

1 Answers

8
votes

A symbol table associates some information with every symbol. In an interpreter, you would perhaps associate values with symbols. Map is one implementation particularly suitable for functional interpreters.

If you want to optimize your interpreter, get rid of the need for a symbol table at runtime. One way to to go is De Bruijn idexing.

There is also nice literature on mechanically deriving optimized interpreters, VMs and compilers from a functional interpreter, for example:

http://www.brics.dk/RS/03/14/BRICS-RS-03-14.pdf

For a simple example, consider lambda calculus with constants encoded with De Bruijn indices. Notice that the evaluator gets by without a symbol table, because it can use integers for lookup.

type exp =
    | App of exp * exp
    | Const of int
    | Fn of exp
    | Var of int

type value =
    | Closure of exp * env
    | Number of int

and env = value []

let lookup env i = Array.get env i
let extend value env = Array.append [| value |] env
let empty () : env = Array.empty

let eval exp =
    let rec eval env exp =
        match exp with
        | App (f, x) ->
            match eval env f with
            | Closure (bodyF, envF) ->
                let vx = eval env x
                eval (extend vx envF) bodyF
            | _ -> failwith "?"
        | Const x -> Number x
        | Fn e -> Closure (e, env)
        | Var x -> lookup env x
    eval (empty ()) exp