Current Choice: lua-jit. Impressive benchmarks, I am getting used to the syntax. Writing a high performance ABI will require careful consideration on how I will structure my C++.
Other Questions of interest
- Gambit-C and Guile as embeddable languages
- Lua Performance Tips (have the option of running with a disabled collector, and calling the collector at the end of a processing run(s) is always an option).
Background
I'm working on a realtime high volume (complex) event processing system. I have a DSL that represents the schema of the event structure at source, the storage format, certain domain specific constructs, firing internal events (to structure and drive general purpose processing), and encoding certain processing steps that always happen.
The DSL looks pretty similar to SQL, infact I am using berkeley db (via sqlite3 interface) for long-term storage of events. The important part here is that the processing of events is done set-based, like SQL. I have come to conclusion however that I should not add general-purpose processing logic to the DSL, and rather embed lua or lisp to take care of this.
The processing core is built arround boost::asio, it is multithreaded, rpc is done via protocol buffers, events are encoded using the protocol buffer IO library --i.e., the events are not structured using protocol buffer object they just use the same encoding/decoding library. I will create a dataset object that contains rows, pretty similar to how a database engine stores in memory sets. processing steps in the DSL will be taken care of first and then presented to the general purpose processing logic.
Regardless of what embeddable scripting environment I use, each thread in my processing core will probably needs it's own embedded-language-environment (that is how lua requires it to be at least if you are doing multi-threaded work).
The Question(s)
The choice at the moment is between lisp ECL and lua. Keeping in mind that performance and throughput is a strong requirement, this means minimising memory allocations is highly desired:
If you were in my position which language would you chose ?
are there any alternatives I should consider (don't suggest languages that don't have an embeddable implementation). Javascript v8 perhaps ?
Does lisp fit the domain better ? I don't think lua and lisp are that different in terms of what they provide. Call me out :D
Are there any other properties (like the ones below) I should be thinking about ?
I assert that any form of embedded database IO (see the example DSL below for context) dwarfs the scripting language call on orders of magnitude, and that picking either will not add much overhead to the overall throughput. Am I on the right track ? :D
Desired Properties
I would like to map my dataset onto a lisp list or lua table and I would like to minimise redundant data copies. For example adding a row from one dataset to another should try to use reference semantics if both tables have the same shape.
I can guarantee that the dataset that is passed as input will not change whilst I have made the lua/lisp call. I want lua and lisp to enforce not altering the dataset as well if possible.
After the embedded call end's the datasets should be destroyed, any references created would need to be replaced with copies (I guess).
DSL Example
I attach a DSL for your viewing pleasure so you can get an idea of what I am trying to achieve. Note: The DSL does not show general purpose processing.
// Derived Events : NewSession EndSession
NAMESPACE WebEvents
{
SYMBOLTABLE DomainName(TEXT) AS INT4;
SYMBOLTABLE STPageHitId(GUID) AS INT8;
SYMBOLTABLE UrlPair(TEXT hostname ,TEXT scriptname) AS INT4;
SYMBOLTABLE UserAgent(TEXT UserAgent) AS INT4;
EVENT 3:PageInput
{
//------------------------------------------------------------//
REQUIRED 1:PagehitId GUID
REQUIRED 2:Attribute TEXT;
REQUIRED 3:Value TEXT;
FABRRICATED 4:PagehitIdSymbol INT8;
//------------------------------------------------------------//
PagehitIdSymbol AS PROVIDED(INT8 ph_symbol)
OR Symbolise(PagehitId) USING STPagehitId;
}
// Derived Event : Pagehit
EVENT 2:PageHit
{
//------------------------------------------------------------//
REQUIRED 1:PageHitId GUID;
REQUIRED 2:SessionId GUID;
REQUIRED 3:DateHit DATETIME;
REQUIRED 4:Hostname TEXT;
REQUIRED 5:ScriptName TEXT;
REQUIRED 6:HttpRefererDomain TEXT;
REQUIRED 7:HttpRefererPath TEXT;
REQUIRED 8:HttpRefererQuery TEXT;
REQUIRED 9:RequestMethod TEXT; // or int4
REQUIRED 10:Https BOOL;
REQUIRED 11:Ipv4Client IPV4;
OPTIONAL 12:PageInput EVENT(PageInput)[];
FABRRICATED 13:PagehitIdSymbol INT8;
//------------------------------------------------------------//
PagehitIdSymbol AS PROVIDED(INT8 ph_symbol)
OR Symbolise(PagehitId) USING STPagehitId;
FIRE INTERNAL EVENT PageInput PROVIDE(PageHitIdSymbol);
}
EVENT 1:SessionGeneration
{
//------------------------------------------------------------//
REQUIRED 1:BinarySessionId GUID;
REQUIRED 2:Domain STRING;
REQUIRED 3:MachineId GUID;
REQUIRED 4:DateCreated DATETIME;
REQUIRED 5:Ipv4Client IPV4;
REQUIRED 6:UserAgent STRING;
REQUIRED 7:Pagehit EVENT(pagehit);
FABRICATED 8:DomainId INT4;
FABRICATED 9:PagehitId INT8;
//-------------------------------------------------------------//
DomainId AS SYMBOLISE(domain) USING DomainName;
PagehitId AS SYMBOLISE(pagehit:PagehitId) USING STPagehitId;
FIRE INTERNAL EVENT pagehit PROVIDE (PagehitId);
}
}
This project is a component of a Ph.D research project and is/will be free software. If your interested in working with me (or contributing) on this project, please leave a comment :D