How to make a mod_perl interpreter sticky by some conventions?

Question

As it seems that mod_perl only manages Perl interpreters per VHOST, is there any way I can influence which cloned interpreter mod_perl selects to process a request? I've read through the configurable scopes and had a look into "modperl_interp_select" in the source and I could see that if a request already has a interpreter associated, that one is selected by mod_perl.

else if (r) {
    if (is_subrequest && (scope == MP_INTERP_SCOPE_REQUEST)) {
[...]
    }
    else {
        p = r->pool;
        get_interp(p);
    }

I would like to add some kind of handler before mod_perl selects an interpreter to process a request and then select an interpreter to assign it to the request myself, based on different criteria included in the request.

But I'm having trouble to understand if such a handler can exist at all or if everything regarding a request is already processed by a selected interpreter of mod_perl.

Additionally, I can see APR::Pool-API, but it doesn't seem to provide the capability to set some user data on a current pool object, which is what mod_perl reads by "get_interp".

Could anyone help me on that? Thanks!

A bit on the background: I have a dir structure in cgi-bin like the following:

cgi-bin
    software1
        customer1
            *.cgi
            *.pm
        customer2
            *.cgi
            *.pm
    software2
        customer1
            *.cgi
            *.pm
        customer2
            *.cgi
            *.pm

Each customer uses a private copy of the software and the software is using itself, e.g. software1 of customer1 may talk to software2 of customer1 by loading some special client libs of software2 into it's own Perl interpreter. To get things more complicated, software2 may even bring general/common parts of software1 with it's own private installation by using svn:external. So I have a lot of the same software with the same Perl packages in one VHOST and I can't guarantee that all of those private installation always have the same version level.

It's quite a mixup, but which is known to work under the rules we have within the same Perl interpreter.

But now comes mod_perl, clones interpreters as needed and reuses them for requests into whichever sub dir of cgi-bin it likes and in this case things will break, because suddenly the interpreter already processed software1 of customer1 and should now process software2 of customer2, which uses common packages of software1, which already where loaded by the Perl interpreter before and are used because of %INC instead of the private packages of software2 and such...

Yes, there are different ways to deal with that, like VHOSTs and sub domains persoftware or customer or whatever, but I would like to check different ways of keeping one VHOST and the current directory structure, just by using what mod_perl or Apache httpd provides. And one way would be if I could tell mod_perl to always use the same Perl interpreter for requests to the same directory. This way mod_perl would create it's pool of interpreters and I would be responsible to select each of them per directory.

Have you specific reason to use mod_perl? Using Plack you gain (AFAIK) much greater control. (maybe I missed some key points)... — jm666
PLACK would be totally new to us, how would you achieve my request using PLACK? I would want it to run in httpd like mod_perl does, not stand-alone, but would need control over which Perl interpreter gets used for which served directory with plain old CGI scripts for various reasons. The last thing is important, interpreters must not be reused for different scripts/directories. — Thorsten Schöning
Maybe i misunderstood your requirements, but I would setup apache (or any httpd such nginx) as an reverse proxy for certain URLs (each your CGI script has its own URL), and will run for each script an own Plack based server (on different ports) using CGI::Emulate::PSGI or CGI::PSGI. So, in short for each CGI-script will run one separated Plack(perl) server on its own port in totally isolated (but httpd-proxied) environment. — jm666
That's the direction I need, but too much configuration overhead. As I need only per directory and not strictly per script, I could achieve similar with using VHOSTs and sub domains. The point is that I would like to avoid that and instead just use a special directory structure to stick Perl interpreters to directories automatically, without any additional configuration or setup on each ne dir. — Thorsten Schöning
For the emulation of classic ../cgi-bin (e.g. many scripts in one directory) here is the Plack::App::CGIBin. But, of course, if you must stuck with apache (and can't use native perl-plack/web-server) such starman and/or others, the easier(?) way will be patch mod_perl sources and maintain the patched version. ;) It was not a recommendation (not enough details) - only an comment. (I'm an Plack fan) :) — jm666

Thorsten Schöning Thorsten Schöning · Accepted Answer · 2015-01-08T10:41:19

What I've learned so far is that it's not easily possible to influence mod_perl's decision about a selected interpreter. If one wants to, it seems that one would need to really patch mod_perl on C level or provide an own C-handler for httpd as a hook to run before mod_perl. In the end mod_perl is only a combination of handlers for httpd itself, so placing one in front of it doing some special things is possible. It's another question if it's wise to do so, because one would have to deal with some mod_perl internals like the fact there's no interpreter available currently and in the end in my case I would need a map somewhere for interpreters and their associated directory to process.

In the end it's not that easy and I don't want to patch mod_perl or start on low level C for a httpd handler/hook.

For documentation purposes I want to mention two possible workarounds which came into my mind:

Pool of Perl threads

The problem with the current mod_perl approach in my case is that it clones Perl interpreters low level in C and those are run by threads provided by a pool of httpd, every thread can run any interpreter any given time, unless it's not already in use by another thread. With this approach it seems impossible to access the interpreters within Perl itself without using any low level XS as well, especially it's not possible to manage the interpreters and threads with Perl's Threads API, simply because it aren't no Perl threads, it's Perl interpreters executed by httpd threads. In the end both behave the same, though, because during creation of Perl threads the current interpreter is cloned as well and associated OS threads are created and such. But while using Perl's threads you have more influence about the shared data and such.

So a workaround for my current problem could be to not let mod_perl and it's interpreters process a request, but instead create an own thread pool of Perl threads directly while starting up in a VHOST using PerlModule or such. Those threads can be directly managed entirely within Perl, one could create some queues to dispatch work in form of absolute paths to requested CGI applications and such. Besides the thread pool itself a handler would be needed which would get called instead of e.g. ModPerl::Registry to function as a dispatcher: It would need to decide based on some criteria which thread to use and put the requested path into it's queue and the thread itself could ultimately e.g. just create new instances of ModPerl::Registry to process the given file. Of course there would be some glue needed here and there...

There are some downsides of this approach of course: It sounds like a fair amount of work, doubles some of the functionality already implemented by mod_perl especially regarding pool maintenance and doubles the amount of threads and memory used, because the mod_perl interpreters and threads would only be used to execute the dispatcher handler and additionally one would have threads and interpreters to process the requests within the Perl threads. The amount of threads shouldn't be a huge problem at all, that one of mod_perl would just sleep and wait for the Perl thread to finish it's work.

@INC-Hook with source code changes

Another, and I guess easier, approach would be to make use of @INC hooks for Perl's require in combination again with an own mod_perl handler extending ModPerl::Registry. The key point is that the handler is the first place where the requested file is read and it's source can be changed before compiling it. Every

use XY::Z;
XY::Z->new(...);

could be changed to

use SomePrefix::XY::Z;
SomePrefix::XY::Z->new(...);

where SomePrefix would simply be the full path of the parent directory of the requested file changed to be a valid Perl package name. ModPerl::Registry already does something similar while transforming a requested CGI script automatically into a mod_perl handler, so this works in general and ModPerl::Registry already provides some logic to generate the package name and such. The change leads to that Perl won't find the packages anymore automatically, simply because they don't exist with the new name in a place known to Perl, that's where the @INC hook applies.

The hook is responsible to recognize such changed packages, simply because of the name of SomePrefix or a marker prefix in front of SomePrefix or whatever, and map those to a file in the file system to provide a handle to the requested file which Perl can load during "require". Additionally, the hook will provide a callback which gets called by Perl for each line of the file read and will function as a source code filter, again changing each "package", "use" or "require" statement to have SomePrefix in front of. This will result again in the hook being responsible for providing file handles to those packages etc.

The key point here is changing the source code during runtime once: Instead of "XY/Z.pm" which Perl would require normally, would be available n times in my directory structure and would be saved as "XY/Z.pm" in %INC, one lets Perl require "SomePrefix/XY/Z.pm", that would be stored in %INC and is unique for every Perl interpreter used by mod_perl because SomePrefix reflects the unique installation directory of a requested file. There's no room anymore for Perl to think it already had loaded XY::Z, just because it processed a request from another directory before.

Of course this only works for easy "use ...;" statements, things like "eval("require $package");" will make things a bit more complicated.

Comments welcome... :-)

How to make a mod_perl interpreter sticky by some conventions?

2 Answers

Pool of Perl threads

@INC-Hook with source code changes