35
votes

Current Choice: lua-jit. Impressive benchmarks, I am getting used to the syntax. Writing a high performance ABI will require careful consideration on how I will structure my C++.

Other Questions of interest

  1. Gambit-C and Guile as embeddable languages
  2. Lua Performance Tips (have the option of running with a disabled collector, and calling the collector at the end of a processing run(s) is always an option).

Background

I'm working on a realtime high volume (complex) event processing system. I have a DSL that represents the schema of the event structure at source, the storage format, certain domain specific constructs, firing internal events (to structure and drive general purpose processing), and encoding certain processing steps that always happen.

The DSL looks pretty similar to SQL, infact I am using berkeley db (via sqlite3 interface) for long-term storage of events. The important part here is that the processing of events is done set-based, like SQL. I have come to conclusion however that I should not add general-purpose processing logic to the DSL, and rather embed lua or lisp to take care of this.

The processing core is built arround boost::asio, it is multithreaded, rpc is done via protocol buffers, events are encoded using the protocol buffer IO library --i.e., the events are not structured using protocol buffer object they just use the same encoding/decoding library. I will create a dataset object that contains rows, pretty similar to how a database engine stores in memory sets. processing steps in the DSL will be taken care of first and then presented to the general purpose processing logic.

Regardless of what embeddable scripting environment I use, each thread in my processing core will probably needs it's own embedded-language-environment (that is how lua requires it to be at least if you are doing multi-threaded work).

The Question(s)

The choice at the moment is between lisp ECL and lua. Keeping in mind that performance and throughput is a strong requirement, this means minimising memory allocations is highly desired:

  1. If you were in my position which language would you chose ?

  2. are there any alternatives I should consider (don't suggest languages that don't have an embeddable implementation). Javascript v8 perhaps ?

  3. Does lisp fit the domain better ? I don't think lua and lisp are that different in terms of what they provide. Call me out :D

  4. Are there any other properties (like the ones below) I should be thinking about ?

  5. I assert that any form of embedded database IO (see the example DSL below for context) dwarfs the scripting language call on orders of magnitude, and that picking either will not add much overhead to the overall throughput. Am I on the right track ? :D

Desired Properties

  1. I would like to map my dataset onto a lisp list or lua table and I would like to minimise redundant data copies. For example adding a row from one dataset to another should try to use reference semantics if both tables have the same shape.

  2. I can guarantee that the dataset that is passed as input will not change whilst I have made the lua/lisp call. I want lua and lisp to enforce not altering the dataset as well if possible.

  3. After the embedded call end's the datasets should be destroyed, any references created would need to be replaced with copies (I guess).

DSL Example

I attach a DSL for your viewing pleasure so you can get an idea of what I am trying to achieve. Note: The DSL does not show general purpose processing.

// Derived Events : NewSession EndSession
NAMESPACE WebEvents
{
  SYMBOLTABLE DomainName(TEXT) AS INT4;
  SYMBOLTABLE STPageHitId(GUID) AS INT8;
  SYMBOLTABLE UrlPair(TEXT hostname ,TEXT scriptname) AS INT4;
  SYMBOLTABLE UserAgent(TEXT UserAgent) AS INT4;  

  EVENT 3:PageInput
  {
    //------------------------------------------------------------//
    REQUIRED 1:PagehitId              GUID
    REQUIRED 2:Attribute              TEXT;
    REQUIRED 3:Value                  TEXT; 

    FABRRICATED 4:PagehitIdSymbol     INT8;
    //------------------------------------------------------------//

    PagehitIdSymbol AS PROVIDED(INT8 ph_symbol)
                    OR Symbolise(PagehitId) USING STPagehitId;
  }

  // Derived Event : Pagehit
  EVENT 2:PageHit
  {
    //------------------------------------------------------------//
    REQUIRED 1:PageHitId              GUID;
    REQUIRED 2:SessionId              GUID;
    REQUIRED 3:DateHit                DATETIME;
    REQUIRED 4:Hostname               TEXT;
    REQUIRED 5:ScriptName             TEXT;
    REQUIRED 6:HttpRefererDomain      TEXT;
    REQUIRED 7:HttpRefererPath        TEXT;
    REQUIRED 8:HttpRefererQuery       TEXT;
    REQUIRED 9:RequestMethod          TEXT; // or int4
    REQUIRED 10:Https                 BOOL;
    REQUIRED 11:Ipv4Client            IPV4;
    OPTIONAL 12:PageInput             EVENT(PageInput)[];

    FABRRICATED 13:PagehitIdSymbol    INT8;
    //------------------------------------------------------------//
    PagehitIdSymbol AS  PROVIDED(INT8 ph_symbol) 
                    OR  Symbolise(PagehitId) USING STPagehitId;

    FIRE INTERNAL EVENT PageInput PROVIDE(PageHitIdSymbol);
  }

  EVENT 1:SessionGeneration
  {
    //------------------------------------------------------------//
        REQUIRED    1:BinarySessionId   GUID;
    REQUIRED    2:Domain            STRING;
    REQUIRED    3:MachineId         GUID;
    REQUIRED    4:DateCreated       DATETIME;
    REQUIRED    5:Ipv4Client        IPV4;
    REQUIRED    6:UserAgent         STRING;
    REQUIRED    7:Pagehit           EVENT(pagehit);

    FABRICATED  8:DomainId          INT4;
    FABRICATED  9:PagehitId         INT8;
    //-------------------------------------------------------------//

    DomainId  AS SYMBOLISE(domain)            USING DomainName;
    PagehitId AS SYMBOLISE(pagehit:PagehitId) USING STPagehitId;

    FIRE INTERNAL EVENT pagehit PROVIDE (PagehitId);
  } 
}

This project is a component of a Ph.D research project and is/will be free software. If your interested in working with me (or contributing) on this project, please leave a comment :D

4

4 Answers

36
votes

I strongly agree with @jpjacobs's points. Lua is an excellent choice for embedding, unless there's something very specific about lisp that you need (for instance, if your data maps particularly well to cons-cells).

I've used lisp for many many years, BTW, and I quite like lisp syntax, but these days I'd generally pick Lua. While I like the lisp language, I've yet to find a lisp implementation that captures the wonderful balance of features/smallness/usability for embedded use the way Lua does.

Lua:

  1. Is very small, both source and binary, an order of magnitude or more smaller than many more popular languages (Python etc). Because the Lua source code is so small and simple, it's perfectly reasonable to just include the entire Lua implementation in your source tree, if you want to avoid adding an external dependency.

  2. Is very fast. The Lua interpreter is much faster than most scripting languages (again, an order of magnitude is not uncommon), and LuaJIT2 is a very good JIT compiler for some popular CPU architectures (x86, arm, mips, ppc). Using LuaJIT can often speed things up by another order of magnitude, and in many cases, the result approaches the speed of C. LuaJIT is also a "drop-in" replacement for standard Lua 5.1: no application or user code changes are required to use it.

  3. Has LPEG. LPEG is a "Parsing Expression Grammar" library for Lua, which allows very easy, powerful, and fast parsing, suitable for both large and small tasks; it's a great replacement for yacc/lex/hairy-regexps. [I wrote a parser using LPEG and LuaJIT, which is much faster than the yacc/lex parser I was trying emulate, and was very easy and straight-forward to create.] LPEG is an add-on package for Lua, but is well-worth getting (it's one source file).

  4. Has a great C-interface, which makes it a pleasure to call Lua from C, or call C from Lua. For interfacing large/complex C++ libraries, one can use SWIG, or any one of a number of interface generators (one can also just use Lua's simple C interface with C++ of course).

  5. Has liberal licensing ("BSD-like"), which means Lua can be embedded in proprietary projects if you wish, and is GPL-compatible for FOSS projects.

  6. Is very, very elegant. It's not lisp, in that it's not based around cons-cells, but it shows clear influences from languages like scheme, with a straight-forward and attractive syntax. Like scheme (at least in it's earlier incarnations), it tends towards "minimal" but does a good job of balancing that with usability. For somebody with a lisp background (like me!), a lot about Lua will seem familiar, and "make sense", despite the differences.

  7. Is very flexible, and such features as metatables allow easily integrating domain-specific types and operations.

  8. Has a simple, attractive, and approachable syntax. This might not be such an advantage over lisp for existing lisp users, but might be relevant if you intend to have end-users write scripts.

  9. Is designed for embedding, and besides its small size and fast speed, has various features such as an incremental GC that make using a scripting language more viable in such contexts.

  10. Has a long history, and responsible and professional developers, who have shown good judgment in how they've evolved the language over the last 2 decades.

  11. Has a vibrant and friendly user-community.

6
votes

You don't state what platform you are using, but if it would be capable of using LuaJIT 2 I'd certainly go for that, since execution speeds approach that of compiled code, and interfacing with C code just got a whole lot easier with the FFI library.

I don't quit know other embeddable scripting languages so I can't really compare what they can do, and how they work with tables.

Lua mostly works with references: all functions, userdata, tables are used by reference, and are collected on the next gc run when no references to the data are left. Strings are internalised, so a certain string is in the memory only once. The thing to take into account is that you should avoid creating and subsequently discarding loads of tables, since this can slow down the GC cycle (as explained in the Lua gem you cited)

For parsing your code sample, I'd take a look at the LPEG library

2
votes

There is a number of options for implementing high performance embedded compilers. One is Mono VM, it naturally comes with dozens of already made high quality languages implemented on top of it, and it is quite embeddable (see how Second Life is using it). It is also possible to use LLVM - looks like your DSL is not complicated, so implementing an ad hoc compiler would not be a big deal.

2
votes

I happened to work on a project which have some parts that is similar to your project, It's a cross-platform system running on Win-CE,Android,iOS, I need maximize cross-platform-able code, C/C++ combine with a embeddable language is a good choice. here is my solution related to your questions.

  1. If you were in my position which language would you chose ?

The DSL in my project is similar to yours. for performance, I wrote a compiler with Yacc/Lex to compile the DSL to binary for runtime and a bunch of API to get information from binary, but it's annoying when there is something modified in DSL syntax, I need to modify both compiler and APIs, so I abondoned the DSL, turned into XML(don't write XML directly, a well defined schema is worthy), I wrote a general compiler converting XML to lua table, reimplement APIs with lua. by doing this I got two benefits: Readability and flexibility, without perceivable performance degradation.

  1. Are there any alternatives I should consider (don't suggest languages that don't have an embeddable implementation). Javascript v8 perhaps ?

Before choosing lua, I considerd Embedded Ch(mostly used in industrial control system) , embedded lisp and lua, at last lua stand out, because lua is well integrated with C, lua have a prosperous community, and lua is easy to learn for another team member. regarding Javascript v8, it's like using a steam-hammer to crack nuts, if used in a embedded realtime system.

  1. Does lisp fit the domain better ? I don't think lua and lisp are that different in terms of what they provide. Call me out :D

For my domain, lisp and lua have the same ability in semantic, they both can handle XML-based DSL easily, or you might even wrote a simple compiler converting XML to lisp list or lua table. they both can handle domain logic easily. but lua is better integrated with C/C++, this is what lua aim for.

  1. Are there any other properties (like the ones below) I should be thinking about ?

Working alone or with team members is also a weighting factor of solution selection. nowadays not so many programmers are familiar with lisp-like language.

  1. I assert that any form of embedded database IO (see the example DSL below for context) dwarfs the scripting language call on orders of magnitude, and that picking either will not add much overhead to the overall throughput. Am I on the right track ? :D

here is a list of programming languages performance, here is a list of access time of computer components. if your system is IO-bound, the overhead of script is not key point. my system is a O&M(Operation & Maintenance) system, script performance is insignificant.