first of all I know my question look familiar but I am actually not asking why a seg-fault occurs when sharing a lua state between different pthread. I am actually asking why they don't seg-fault in a specific case described below. I tried to organize it as well as I could but I realize it is very long. Sorry about that. A bit of background: I am writing a program which is using the Lua interpreter as a base for the user to execute instructions and using the ROOT libraries (https://root.cern.ch/) to display graphs, histograms, etc... All of this is working just fine but then I tried to implement a way for the user to start a background task while keeping the ability to input commands in the Lua prompt, to be able to do something else entirely while the task finishes, or to request to stop it for instance. My first attempt was the following: First on the Lua side I load some helper functions and initialize global variables
-- Lua script
RootTasks = {}
NextTaskToStart = nil
function SetupNewTask(taskname, fn, ...)
local task = function(...)
local rets = table.pack(fn(...))
RootTasks[taskname].status = "done"
return table.unpack(rets)
end
RootTasks[taskname] = {
task = SetupNewTask_C(task, ...),
status = "waiting",
}
NextTaskToStart = taskname
end
Then on the C side
// inside the C++ script
int SetupNewTask_C ( lua_State* L )
{
// just a function to check if the argument is valid
if ( !CheckLuaArgs ( L, 1, true, "SetupNewTask_C", LUA_TFUNCTION ) ) return 0;
int nvals = lua_gettop ( L );
lua_newtable ( L );
for ( int i = 0; i < nvals; i++ )
{
lua_pushvalue ( L, 1 );
lua_remove ( L, 1 );
lua_seti ( L, -2, i+1 );
}
return 1;
}
Basically the user provide the function to execute followed by the parameters to pass and it just pushes a table with the function to execute as the first field and the arguments as subsequent fields. This table is pushed on top of the stack, I retrieve it and store it a global variable. The next step is on the Lua side
-- Lua script
function StartNewTask(taskname, fn, ...)
SetupNewTask(taskname, fn, ...)
StartNewTask_C()
RootTasks[taskname].status = "running"
end
and on the C side
// In the C++ script
// lua, called below, is a pointer to the lua_State
// created when starting the Lua interpreter
void* NewTaskFn ( void* arg )
{
// helper function to get global fields from
// strings like "something.field.subfield"
// Retrieve the name of the task to be started (has been pushed as
// a global variable by previous call to SetupNewTask_C)
TryGetGlobalField ( lua, "NextTaskToStart" );
if ( lua_type ( lua, -1 ) != LUA_TSTRING )
{
cerr << "Next task to schedule is undetermined..." << endl;
return nullptr;
}
string nextTask = lua_tostring ( lua, -1 );
lua_pop ( lua, 1 );
// Now we get the actual table with the function to execute
// and the arguments
TryGetGlobalField ( lua, ( string ) ( "RootTasks."+nextTask ) );
if ( lua_type ( lua, -1 ) != LUA_TTABLE )
{
cerr << "This task does not exists or has an invalid format..." << endl;
return nullptr;
}
// The field "task" from the previous table contains the
// function and arguments
lua_getfield ( lua, -1, "task" );
if ( lua_type ( lua, -1 ) != LUA_TTABLE )
{
cerr << "This task has an invalid format..." << endl;
return nullptr;
}
lua_remove ( lua, -2 );
int taskStackPos = lua_gettop ( lua );
// The first element of the table we retrieved is the function so the
// number of arguments for that function is the table length - 1
int nargs = lua_rawlen ( lua, -1 ) - 1;
// That will be the function
lua_geti ( lua, taskStackPos, 1 );
// And the arguments...
for ( int i = 0; i < nargs; i++ )
{
lua_geti ( lua, taskStackPos, i+2 );
}
lua_remove ( lua, taskStackPos );
// I just reset the global variable NextTaskToStart as we are
// about to start the scheduled one.
lua_pushnil ( lua );
TrySetGlobalField ( lua, "NextTaskToStart" );
// Let's go!
lua_pcall ( lua, nargs, LUA_MULTRET, 0 );
}
int StartNewTask_C ( lua_State* L )
{
pthread_t newTask;
pthread_create ( &newTask, nullptr, NewTaskFn, nullptr );
return 0;
}
So for instance a call in the Lua interpreter to
> StartNewTask("PeriodicPrint", function(str) for i=1,10 print(str);
>> sleep(1); end end, "Hello")
Will produce for the next 10 seconds a print of "Hello" every second. It will then return from execution and everything is wonderful. Now if I ever hit ENTER key while that task is running, the program dies in horrible seg-fault sufferings (which I don't copy here as each time it seg-fault the error log is different, sometimes there is no error at all). So I read a bit online what could be the matter and I found several mention that the lua_State are not thread safe. I don't really understand why just hitting ENTER will make it flip out but that's not really the point here.
I discovered by accident that this approach could work without any seg-faulting with a tiny modification. Instead of running the function directly, if a coroutine is executed, everything I wrote above works just fine.
replace the previous Lua side function SetupNewTask with
function SetupNewTask(taskname, fn, ...)
local task = coroutine.create( function(...)
local rets = table.pack(fn(...))
RootTasks[taskname].status = "done"
return table.unpack(rets)
end)
local taskfn = function(...)
coroutine.resume(task, ...)
end
RootTasks[taskname] = {
task = SetupNewTask_C(taskfn, ...),
routine = task,
status = "waiting",
}
NextTaskToStart = taskname
end
I can execute several tasks at once for extended period of time without getting any seg-faults. So we finally come to my question: Why using coroutine works? What is the fundamental difference in this case? I just call coroutine.resume and I do not do any yield (or anything else for what matters). Then just wait for the coroutine to be done and that's it. Are coroutine doing something I do not suspect?