No, not currently (<=5.3.x).
But there are a bunch of things you can do to improve the situation:
…I run out of memory quickly as the garbage collector can not keep up.
The GC tracks the sizes of full userdata (and increases GC debt accordingly), but doesn't know them for light userdata. If you
lua_pushlightuserdata
( L, ptr );
a value allocated elsewhere (by malloc
, mmap
, etc.), the GC "sees" a size of zero. If you use lua_newuserdata
( L, size )
for allocation, the GC knows the full size.
You probably use lua_newuserdata
for a thin wrapper struct (to get __gc
) around "fat" data referenced (invisibly to Lua) from that struct, accordingly the GC sees a few kilobytes of used memory while you're hogging gigabytes. If possible, try switching the allocator for the inner thing from malloc
to lua_newuserdata
. (Unless you need to share data across Lua states and/or need special alignment or have other constraints...) That should solve most of the problems. (It will still be constantly collecting stuff, but at least it won't OOM anymore.)
Failing that, insert explicit GC step calls before every allocation (with a $((size of invisible allocation in kilobytes)) step, to emulate most of what you'd have with full userdata), and play with the GC step multiplier and/or pause.
If that isn't enough, you can try more involved things that require uglifying your code.
Operator metamethods only know about the two arguments, so they always have to create a new target value. If you use functions instead of operators, you can pass an existing disused value as the target, e.g.
local psi_ij, psi_ijj
for j=1,N do
for i=j,N do
psi_ij = qmul( psi[i],psi[j], psi_ij )
psi_ijj = qmul( psi_ij,psi[j], psi_ijj )
psi[j] = qsub( psi[j],psi_ijj, psi[j] )
end
end
where qadd
, qsub
, qmul
, etc. would take (a,b[, target])
and re-use target
if given or otherwise allocate new storage. (That would reduce the loop from N^2 allocations down to two allocations.) A nice trick if you define them this way around is that you can also use the same functions as operator metamethods. (target
will simply always be absent, so if you don't care about allocations you can just use the operators.)
(If the psi[k]
do not all have the same size, such that the sizes of intermediate values vary, you can explicitly mark target
as free in the math function if it's incompatible, then maybe combine with lookup storage as below:)
Another option is to have a store of disused values and explicitly marking values as disused, then letting allocators preferably reuse existing-disused values. Sketchy untested code:
-- hidden in implementation somewhere
-- MT to mark per-size stores as weak so GC can collect values
local weak = { __mode = "k" }
-- storage of freed values, auto-create per-size storage table
local freed = setmetatable( { }, {
__index = function( t, k )
local v = setmetatable( { }, weak )
t[k] = v
return v
end
} )
-- interface to that storage
-- replace size( v ) by some way to get a size/layout descriptor
-- (e.g. number or string giving size in bytes or dimensions or ...)
function free( v, ... )
freed[size( v )][v] = true -- mark as free
if ... then return free( ... ) end
end
-- return re-usable value of size/layout vsize, or nil if allocation needed
function reuse( vsize )
local v = next( freed[vsize] )
if not v then return nil end
freed[vsize][v] = nil -- un-mark
return v
end
With this, your allocator should then first check reuse
and only if that returns nil actually allocate a new value. And your code would have to be changed like e.g.:
for j=1,N do
for i=j,N do
local psi_j = psi[j]
local psi_ij = psi[i] * psi_j
local psi_ijj = psi_ij * psi_j
psi[j] = psi[j] - psi_ijj
free( psi_j, psi_ij, psi_ijj )
end
end
With a changed definition of free, like
local temp = setmetatable( { }, { __mode = "v" } )
function free( v )
table.insert( temp, v )
if #temp > 2 then
local w = table.remove( temp, 1 )
freed[size( w )][w] = true
end
return v
end
you add enough delay that you can write expressions inline (by the time the value is actually marked as free, a binary operation will have been evaluated if nothing else happens in between):
local _ = free
for j=1,N do
for i=j,N do
local psi_j = psi[j]
psi[j] = psi_j - _(_(psi[i] * psi_j) * psi_j)
free( psi_j )
end
end
...which looks better and gets rid of the need to track intermediate values but is pretty easy to break by accidentally freeing too early (e.g. instead of freeing psi[j]
at the end, doing psi[j] = _(psi[j]) - _(_(psi[i] * psi[j]) * psi[j])
would break.)
While this last variation (almost) recovers normal-looking expressions, it's really not that good an idea. And the one before that isn't much better than using functions instead of operators but requires more brittle bookkeeping and is slower. So if you decide to micro-manage allocations, I'd suggest using a variation on the first version. (You could also have the functions as methods, possibly with arguments switched around (target:add(a,b)
) like torch does (but then you need explicit allocation of target
in the code and need to know the sizes before you do the operation...) All these variations suck in different ways. Try to avoid having to go this far, and failing that build the one you dislike least.)
destroy
function to your userdatas, which frees the data early. – user253751"step"
option with various values and see if I can get good results – Chris Beck