2
votes

I've implemented a code to solve problems in quantum mechanics on top of Lua (C API). It adds quantum mechanical operators and wave-functions to the script language. So far so good. The challenge is that wavefunction userdata can be large (The userdata contains pointers to arrays between 1Mb and 1Gb) I've added standard garbage collection methods and for simple cases they work.

    static int LuaWavefunctionDestroy(lua_State * L)
    {
      WaveFunctionType *psi = (WaveFunctionType*) luaL_checkudata(L, 1, "Wavefunction_Type");
      WaveFunctionFree(psi);
      return 0;
    }

and metamethod calls using _gc

    static const struct luaL_Reg Wavefunction_meta[] = {
      {"__add", LuaWavefunctionAdd},
      {"__sub", LuaWavefunctionSub},
      {"__mul", LuaWavefunctionMul},
      {"__div", LuaWavefunctionDiv},
      {"__unm", LuaWavefunctionUnm},
      {"__index", LuaWavefunctionIndex},
      {"__newindex", LuaWavefunctionNewIndex},
      {"__tostring", LuaWavefunctionToString},
      {"__gc", LuaWavefunctionDestroy},
      {NULL, NULL}
    };

If I however now run the following code in Lua

    for j=1,N do
      for i=j,N do
        psi[j] = psi[j] - ( psi[i] * psi[j] ) * psi[j]
      end
    end

with psi being a table (array) of a few (10-100) wave-functions I run out of memory quickly as the garbage collector can not keep up.

To make things worse, I've registered several thousand strings and constants (numbers) so a full garbage collection needs to go through many variables

Is there a way to run garbage collection on a specific object or on userdata only?

1
Probably not. But you could add a destroy function to your userdatas, which frees the data early.user253751
I think you should call the garbage collector incrementally as you go, rather than trying to do it all in one shot. For instance check all of the options lua.org/manual/5.3/manual.html#pdf-collectgarbage I would try using the "step" option with various values and see if I can get good resultsChris Beck

1 Answers

2
votes

No, not currently (<=5.3.x).

But there are a bunch of things you can do to improve the situation:

…I run out of memory quickly as the garbage collector can not keep up.

The GC tracks the sizes of full userdata (and increases GC debt accordingly), but doesn't know them for light userdata. If you lua_pushlightuserdata( L, ptr ); a value allocated elsewhere (by malloc, mmap, etc.), the GC "sees" a size of zero. If you use lua_newuserdata( L, size ) for allocation, the GC knows the full size.

You probably use lua_newuserdata for a thin wrapper struct (to get __gc) around "fat" data referenced (invisibly to Lua) from that struct, accordingly the GC sees a few kilobytes of used memory while you're hogging gigabytes. If possible, try switching the allocator for the inner thing from malloc to lua_newuserdata. (Unless you need to share data across Lua states and/or need special alignment or have other constraints...) That should solve most of the problems. (It will still be constantly collecting stuff, but at least it won't OOM anymore.)

Failing that, insert explicit GC step calls before every allocation (with a $((size of invisible allocation in kilobytes)) step, to emulate most of what you'd have with full userdata), and play with the GC step multiplier and/or pause.


If that isn't enough, you can try more involved things that require uglifying your code.

Operator metamethods only know about the two arguments, so they always have to create a new target value. If you use functions instead of operators, you can pass an existing disused value as the target, e.g.

local psi_ij, psi_ijj
for j=1,N do
  for i=j,N do
    psi_ij  = qmul( psi[i],psi[j], psi_ij )
    psi_ijj = qmul( psi_ij,psi[j], psi_ijj )
    psi[j]  = qsub( psi[j],psi_ijj, psi[j] )
  end
end

where qadd, qsub, qmul, etc. would take (a,b[, target]) and re-use target if given or otherwise allocate new storage. (That would reduce the loop from N^2 allocations down to two allocations.) A nice trick if you define them this way around is that you can also use the same functions as operator metamethods. (target will simply always be absent, so if you don't care about allocations you can just use the operators.)

(If the psi[k] do not all have the same size, such that the sizes of intermediate values vary, you can explicitly mark target as free in the math function if it's incompatible, then maybe combine with lookup storage as below:)

Another option is to have a store of disused values and explicitly marking values as disused, then letting allocators preferably reuse existing-disused values. Sketchy untested code:

-- hidden in implementation somewhere
-- MT to mark per-size stores as weak so GC can collect values
local weak = { __mode = "k" }
-- storage of freed values, auto-create per-size storage table
local freed = setmetatable( { }, {
  __index = function( t, k )
    local v = setmetatable( { }, weak )
    t[k] = v
    return v
  end
} )

-- interface to that storage
-- replace size( v ) by some way to get a size/layout descriptor
--   (e.g. number or string giving size in bytes or dimensions or ...)
function free( v, ... )
  freed[size( v )][v] = true  -- mark as free
  if ... then  return free( ... )  end
end
-- return re-usable value of size/layout vsize, or nil if allocation needed
function reuse( vsize )
  local v = next( freed[vsize] )
  if not v then  return nil  end
  freed[vsize][v] = nil    -- un-mark
  return v
end

With this, your allocator should then first check reuse and only if that returns nil actually allocate a new value. And your code would have to be changed like e.g.:

for j=1,N do
  for i=j,N do
    local psi_j = psi[j]
    local psi_ij  = psi[i] * psi_j
    local psi_ijj = psi_ij * psi_j
    psi[j] = psi[j] - psi_ijj
    free( psi_j, psi_ij, psi_ijj )
  end
end

With a changed definition of free, like

local temp = setmetatable( { }, { __mode = "v" } )
function free( v )
  table.insert( temp, v )
  if #temp > 2 then
    local w = table.remove( temp, 1 )
    freed[size( w )][w] = true
  end
  return v
end

you add enough delay that you can write expressions inline (by the time the value is actually marked as free, a binary operation will have been evaluated if nothing else happens in between):

local _ = free
for j=1,N do
  for i=j,N do
    local psi_j = psi[j]
    psi[j] = psi_j - _(_(psi[i] * psi_j) * psi_j)
    free( psi_j )
  end
end

...which looks better and gets rid of the need to track intermediate values but is pretty easy to break by accidentally freeing too early (e.g. instead of freeing psi[j] at the end, doing psi[j] = _(psi[j]) - _(_(psi[i] * psi[j]) * psi[j]) would break.)

While this last variation (almost) recovers normal-looking expressions, it's really not that good an idea. And the one before that isn't much better than using functions instead of operators but requires more brittle bookkeeping and is slower. So if you decide to micro-manage allocations, I'd suggest using a variation on the first version. (You could also have the functions as methods, possibly with arguments switched around (target:add(a,b)) like torch does (but then you need explicit allocation of target in the code and need to know the sizes before you do the operation...) All these variations suck in different ways. Try to avoid having to go this far, and failing that build the one you dislike least.)