152
votes

I work on a somewhat large web application, and the backend is mostly in PHP. There are several places in the code where I need to complete some task, but I don't want to make the user wait for the result. For example, when creating a new account, I need to send them a welcome email. But when they hit the 'Finish Registration' button, I don't want to make them wait until the email is actually sent, I just want to start the process, and return a message to the user right away.

Up until now, in some places I've been using what feels like a hack with exec(). Basically doing things like:

exec("doTask.php $arg1 $arg2 $arg3 >/dev/null 2>&1 &");

Which appears to work, but I'm wondering if there's a better way. I'm considering writing a system which queues up tasks in a MySQL table, and a separate long-running PHP script that queries that table once a second, and executes any new tasks it finds. This would also have the advantage of letting me split the tasks among several worker machines in the future if I needed to.

Am I re-inventing the wheel? Is there a better solution than the exec() hack or the MySQL queue?

15

15 Answers

82
votes

I've used the queuing approach, and it works well as you can defer that processing until your server load is idle, letting you manage your load quite effectively if you can partition off "tasks which aren't urgent" easily.

Rolling your own isn't too tricky, here's a few other options to check out:

  • GearMan - this answer was written in 2009, and since then GearMan looks a popular option, see comments below.
  • ActiveMQ if you want a full blown open source message queue.
  • ZeroMQ - this is a pretty cool socket library which makes it easy to write distributed code without having to worry too much about the socket programming itself. You could use it for message queuing on a single host - you would simply have your webapp push something to a queue that a continuously running console app would consume at the next suitable opportunity
  • beanstalkd - only found this one while writing this answer, but looks interesting
  • dropr is a PHP based message queue project, but hasn't been actively maintained since Sep 2010
  • php-enqueue is a recently (2017) maintained wrapper around a variety of queue systems
  • Finally, a blog post about using memcached for message queuing

Another, perhaps simpler, approach is to use ignore_user_abort - once you've sent the page to the user, you can do your final processing without fear of premature termination, though this does have the effect of appearing to prolong the page load from the user perspective.

23
votes

When you just want to execute one or several HTTP requests without having to wait for the response, there is a simple PHP solution, as well.

In the calling script:

$socketcon = fsockopen($host, 80, $errno, $errstr, 10);
if($socketcon) {   
   $socketdata = "GET $remote_house/script.php?parameters=... HTTP 1.1\r\nHost: $host\r\nConnection: Close\r\n\r\n";      
   fwrite($socketcon, $socketdata); 
   fclose($socketcon);
}
// repeat this with different parameters as often as you like

On the called script.php, you can invoke these PHP functions in the first lines:

ignore_user_abort(true);
set_time_limit(0);

This causes the script to continue running without time limit when the HTTP connection is closed.

17
votes

Another way to fork processes is via curl. You can set up your internal tasks as a webservice. For example:

Then in your user accessed scripts make calls to the service:

$service->addTask('t1', $data); // post data to URL via curl

Your service can keep track of the queue of tasks with mysql or whatever you like the point is: it's all wrapped up within the service and your script is just consuming URLs. This frees you up to move the service to another machine/server if necessary (ie easily scalable).

Adding http authorization or a custom authorization scheme (like Amazon's web services) lets you open up your tasks to be consumed by other people/services (if you want) and you could take it further and add a monitoring service on top to keep track of queue and task status.

It does take a bit of set-up work but there are a lot of benefits.

9
votes

If it just a question of providing expensive tasks, in case of php-fpm is supported, why not to use fastcgi_finish_request() function?

This function flushes all response data to the client and finishes the request. This allows for time consuming tasks to be performed without leaving the connection to the client open.

You don't really use asynchronicity in this way:

  1. Make all your main code first.
  2. Execute fastcgi_finish_request().
  3. Make all heavy stuff.

Once again php-fpm is needed.

7
votes

I've used Beanstalkd for one project, and planned to again. I've found it to be an excellent way to run asynchronous processes.

A couple of things I've done with it are:

  • Image resizing - and with a lightly loaded queue passing off to a CLI-based PHP script, resizing large (2mb+) images worked just fine, but trying to resize the same images within a mod_php instance was regularly running into memory-space issues (I limited the PHP process to 32MB, and the resizing took more than that)
  • near-future checks - beanstalkd has delays available to it (make this job available to run only after X seconds) - so I can fire off 5 or 10 checks for an event, a little later in time

I wrote a Zend-Framework based system to decode a 'nice' url, so for example, to resize an image it would call QueueTask('/image/resize/filename/example.jpg'). The URL was first decoded to an array(module,controller,action,parameters), and then converted to JSON for injection to the queue itself.

A long running cli script then picked up the job from the queue, ran it (via Zend_Router_Simple), and if required, put information into memcached for the website PHP to pick up as required when it was done.

One wrinkle I did also put in was that the cli-script only ran for 50 loops before restarting, but if it did want to restart as planned, it would do so immediately (being run via a bash-script). If there was a problem and I did exit(0) (the default value for exit; or die();) it would first pause for a couple of seconds.

5
votes

Here is a simple class I coded for my web application. It allows for forking PHP scripts and other scripts. Works on UNIX and Windows.

class BackgroundProcess {
    static function open($exec, $cwd = null) {
        if (!is_string($cwd)) {
            $cwd = @getcwd();
        }

        @chdir($cwd);

        if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
            $WshShell = new COM("WScript.Shell");
            $WshShell->CurrentDirectory = str_replace('/', '\\', $cwd);
            $WshShell->Run($exec, 0, false);
        } else {
            exec($exec . " > /dev/null 2>&1 &");
        }
    }

    static function fork($phpScript, $phpExec = null) {
        $cwd = dirname($phpScript);

        @putenv("PHP_FORCECLI=true");

        if (!is_string($phpExec) || !file_exists($phpExec)) {
            if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
                $phpExec = str_replace('/', '\\', dirname(ini_get('extension_dir'))) . '\php.exe';

                if (@file_exists($phpExec)) {
                    BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
                }
            } else {
                $phpExec = exec("which php-cli");

                if ($phpExec[0] != '/') {
                    $phpExec = exec("which php");
                }

                if ($phpExec[0] == '/') {
                    BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
                }
            }
        } else {
            if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
                $phpExec = str_replace('/', '\\', $phpExec);
            }

            BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
        }
    }
}
4
votes

This is the same method I have been using for a couple of years now and I haven't seen or found anything better. As people have said, PHP is single threaded, so there isn't much else you can do.

I have actually added one extra level to this and that's getting and storing the process id. This allows me to redirect to another page and have the user sit on that page, using AJAX to check if the process is complete (process id no longer exists). This is useful for cases where the length of the script would cause the browser to timeout, but the user needs to wait for that script to complete before the next step. (In my case it was processing large ZIP files with CSV like files that add up to 30 000 records to the database after which the user needs to confirm some information.)

I have also used a similar process for report generation. I'm not sure I'd use "background processing" for something such as an email, unless there is a real problem with a slow SMTP. Instead I might use a table as a queue and then have a process that runs every minute to send the emails within the queue. You would need to be warry of sending emails twice or other similar problems. I would consider a similar queueing process for other tasks as well.

4
votes

PHP HAS multithreading, its just not enabled by default, there is an extension called pthreads which does exactly that. You'll need php compiled with ZTS though. (Thread Safe) Links:

Examples

Another tutorial

pthreads PECL Extension

UPDATE: since PHP 7.2 parallel extension comes into play

Tutorial/Example

reference manual

2
votes

It's a great idea to use cURL as suggested by rojoca.

Here is an example. You can monitor text.txt while the script is running in background:

<?php

function doCurl($begin)
{
    echo "Do curl<br />\n";
    $url = 'http://'.$_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'];
    $url = preg_replace('/\?.*/', '', $url);
    $url .= '?begin='.$begin;
    echo 'URL: '.$url.'<br>';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo 'Result: '.$result.'<br>';
    curl_close($ch);
}


if (empty($_GET['begin'])) {
    doCurl(1);
}
else {
    while (ob_get_level())
        ob_end_clean();
    header('Connection: close');
    ignore_user_abort();
    ob_start();
    echo 'Connection Closed';
    $size = ob_get_length();
    header("Content-Length: $size");
    ob_end_flush();
    flush();

    $begin = $_GET['begin'];
    $fp = fopen("text.txt", "w");
    fprintf($fp, "begin: %d\n", $begin);
    for ($i = 0; $i < 15; $i++) {
        sleep(1);
        fprintf($fp, "i: %d\n", $i);
    }
    fclose($fp);
    if ($begin < 10)
        doCurl($begin + 1);
}

?>
1
votes

Unfortunately PHP does not have any kind of native threading capabilities. So I think in this case you have no choice but to use some kind of custom code to do what you want to do.

If you search around the net for PHP threading stuff, some people have come up with ways to simulate threads on PHP.

1
votes

If you set the Content-Length HTTP header in your "Thank You For Registering" response, then the browser should close the connection after the specified number of bytes are received. This leaves the server side process running (assuming that ignore_user_abort is set) so it can finish working without making the end user wait.

Of course you will need to calculate the size of your response content before rendering the headers, but that's pretty easy for short responses (write output to a string, call strlen(), call header(), render string).

This approach has the advantage of not forcing you to manage a "front end" queue, and although you may need to do some work on the back end to prevent racing HTTP child processes from stepping on each other, that's something you needed to do already, anyway.

1
votes

If you don't want the full blown ActiveMQ, I recommend to consider RabbitMQ. RabbitMQ is lightweight messaging that uses the AMQP standard.

I recommend to also look into php-amqplib - a popular AMQP client library to access AMQP based message brokers.

0
votes

i think you should try this technique it will help to call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous.

cornjobpage.php //mainpage

    <?php

post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue");
//post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue2");
//post_async("http://localhost/projectname/otherpage.php", "Keywordname=anyValue");
//call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous.
            ?>
            <?php

            /*
             * Executes a PHP page asynchronously so the current page does not have to wait for it to     finish running.
             *  
             */
            function post_async($url,$params)
            {

                $post_string = $params;

                $parts=parse_url($url);

                $fp = fsockopen($parts['host'],
                    isset($parts['port'])?$parts['port']:80,
                    $errno, $errstr, 30);

                $out = "GET ".$parts['path']."?$post_string"." HTTP/1.1\r\n";//you can use POST instead of GET if you like
                $out.= "Host: ".$parts['host']."\r\n";
                $out.= "Content-Type: application/x-www-form-urlencoded\r\n";
                $out.= "Content-Length: ".strlen($post_string)."\r\n";
                $out.= "Connection: Close\r\n\r\n";
                fwrite($fp, $out);
                fclose($fp);
            }
            ?>

testpage.php

    <?
    echo $_REQUEST["Keywordname"];//case1 Output > testValue
    ?>

PS:if you want to send url parameters as loop then follow this answer :https://stackoverflow.com/a/41225209/6295712

0
votes

Spawning new processes on the server using exec() or directly on another server using curl doesn't scale all that well at all, if we go for exec you are basically filling your server with long running processes which can be handled by other non web facing servers, and using curl ties up another server unless you build in some sort of load balancing.

I have used Gearman in a few situations and I find it better for this sort of use case. I can use a single job queue server to basically handle queuing of all the jobs needing to be done by the server and spin up worker servers, each of which can run as many instances of the worker process as needed, and scale up the number of worker servers as needed and spin them down when not needed. It also let's me shut down the worker processes entirely when needed and queues the jobs up until the workers come back online.

-4
votes

PHP is a single-threaded language, so there is no official way to start an asynchronous process with it other than using exec or popen. There is a blog post about that here. Your idea for a queue in MySQL is a good idea as well.

Your specific requirement here is for sending an email to the user. I'm curious as to why you are trying to do that asynchronously since sending an email is a pretty trivial and quick task to perform. I suppose if you are sending tons of email and your ISP is blocking you on suspicion of spamming, that might be one reason to queue, but other than that I can't think of any reason to do it this way.