1
votes

I am currently writing a runtime system software for distributed systems and then I intend to evaluate some parallel management stuff. I relied my runtime systems on the task programming model as in the OpenMP3.0 standard, but for another category of machines with MPI.

To do that, I create some MPI processes (one per machine) and launch several threads on it. There is one master process which is responsible to create new tasks for other processes, and it needs to send some work to do. Each tasks contains a function pointer (work to do), and a set of arguments passed to this function. Something like this:

    class Task
    {
      public:
        typdef struct
        {
          // ... Storing and packing arguments
        } args_t;
        Task();
        ~Task();
        void exec()
        {
          // Executing the function pointed by "func_ptr"
          // with the specified arguments in "args"
          func_ptr( args );
        }
      private:
        void (*func_ptr)(args_t);
        args_t args;
    };

For passing arguments, I will intend to use MPI_Type_create_struct functions. However, my problem is now: how to send the function to another MPI process ? If I send the pointer function, it will be no longer valid in the address space of the MPI process receiver. As I can not know the number of different type of tasks I will have, it adds another difficulty because I cannot create a corresponding map and just send a unique id to MPI process. Do you have any idea to fix my problem ?

Thank you !

1
How can you not know the "number of different type of tasks" at any time? Your analysis of limitations is pretty much on point. What you want is pretty much unfeasible barring really hacky hacks.Zulan
Well, in fact I can know how many types I will have during the execution. And, for the problem in fact I am looking for a trick which enables me to avoid these limitations. Maybe, a trick to launch one "sub-program" on a MPI process instead of sending a function or a function pointer. However, I have no idea about the feasibility of it, and maybe someone already faces to a similar problem ?Adrien Roussel
Communicating functions via MPI is unheard of. But you can definitely create and terminate a set of processes from a set of running process using MPI_Comm_spawn and MPI_Comm_spawn_multiple , if this is what you meant by starting a 'sub-program'ggulgulia
If all your MPI tasks run the very same binary, all the subroutines should be mapped at the same address, so sending a pointer might work. If all the subroutines are in the address space of all the MPI tasks, then you can pass the subroutine name (e.g. a null terminated string) and use dlsym() to find its address.Gilles Gouaillardet
If you do something very dynamic, a hacky option is to generate a dynamic library "on the fly", transfer it as binary data and then use dlopen() and dlsym() to retrieve the function name.Gilles Gouaillardet

1 Answers

0
votes

As proposed by Gilles Gouillardet, I tried to fix this issue by using dlopen() and dlsym() functions. I tried a little program to find a pointer to a helloWorld function:

    #include <dlfcn.h>
    #include <iostream>

    void helloWorld(void)
    {
      std::cout << "Hello World !" << std::endl;
    }

    int main(int argc, char** argv)
    {
        void *handle;
        void (*task)(void);
        char* error;
        handle = dlopen(NULL, RTLD_LAZY);
        if(!handle)
        {
          fprintf(stderr, "dlopen error: %s\n", dlerror());
          exit(EXIT_FAILURE);
        }
        dlerror();

        *(void **) (&task) = dlsym(handle, "helloWorld");
        if( (error = dlerror()) != NULL)
        {
          fprintf(stderr, "dlsym error: %s\n", dlerror());
          exit(EXIT_FAILURE);
        }
        dlclose(handle);

      return EXIT_SUCCESS;
    }

However, the function dlsym is not able to find the helloWorld function, and returns the error message:

    dlsym error: (null)

I do not try to find a solution to this problem now, but I am looking for it. If someone has any experience with the dlsymp function, please let's share your experience with me.

EDIT: I passed "NULL" to dlopen thanks to the dlopen manpage (https://linux.die.net/man/3/dlsym) which specifies:

The function dlopen() loads the dynamic library file named by the null-terminated string filename and returns an opaque "handle" for the dynamic library. If filename is NULL, then the returned handle is for the main program.