8
votes

Which design pattern exist to realize the execution of some PHP processes and the collection of the results in one PHP process?

Background:
I do have many large trees (> 10000 entries) in PHP and have to run recursive checks on it. I want to reduce the elapsed execution time.

7
... write a C extension?jldupont

7 Answers

10
votes

If your goal is minimal time - the solution is simple to describe, but not that simple to implement.

You need to find a pattern to divide the work (You don't provide much information in the question in this regard).

Then use one master process that forks children to do the work. As a rule the total number of processes you use should be between n and 2n, where n is the number of cores the machine has.

Assuming this data will be stored in files you might consider using non-blocking IO to maximize the throughput. Not doing so will make most of your process spend time waiting for the disk. PHP has stream_select() that might help you. Note that using it is not trivial.

If you decide not to use select - increasing the number of processes might help.


In regards to pcntl functions: I've written a deamon with them (a proper one with forking, changing session id, the running user, etc...) and it's one of the most reliable piece of software I've written. Because it spawns workers for every task, even if there is a bug in one of the tasks, it does not affect the others.

11
votes

From your php script, you could launch another script (using exec) to do the processing. Save status updates in a text file, which could then be read periodically by the parent thread.

Note: to avoid php waiting for the exec'd script to complete, pipe the output to a file:

exec('/path/to/file.php | output.log');

Alternatively, you can fork a script using the PCNTL functions. This uses one php script, which when forked can detect whether it is the parent or the child and operate accordingly. There are functions to send/receive signals for the purpose of communicating between parent/child, or you have the child log to a file and the parent read from that file.

From the pcntl_fork manual page:

$pid = pcntl_fork();
if ($pid == -1) {
     die('could not fork');
} else if ($pid) {
     // we are the parent
     pcntl_wait($status); //Protect against Zombie children
} else {
     // we are the child
}
4
votes

This might be a good time to consider using a message queue, even if you run it all on one machine.

3
votes

The question seems to be a bit confused.

I want to reduce the absolute execution time.

Do you mean elapsed time? Certainly use of the right data-structure will improve throughput, but for a given data-structure, the minmimum order of the algorithm is absolute, and nothing to do with how you implement the algorithm.

Which design pattern exist to realize....?

Design Patterns are something which code is, not a template for writing programs, and a useful tools for curriculum design. To start with a pattern and make your code fit it is in itself an anti-pattern.

Nobody can answer this question withuot knowing a lot more about your data and how its structured, however the key driver for efficiency will be the data-structure you use to implement your tree. If elapsed time is important then certainly look at parallel execution, however it may also be worth considering performing the operation in a different tool - databases are highly optimized for dealing with large sets of data, however note that the obvious method for describing a tree in a relational database is very inefficient when it comes to isolating sub-trees and walking the tree.

In response to Adam's suggesting of forking you replied:

I "heard" that pcntl isnt a good solution. Any experiences?

Where did you hear that? Certainly forking from a CGI or mod_php invoked script is a bad idea, but nothing wrong with doing it from the command line. Do have a google for long running PHP processes (be warned there is a lot of bad information out there). What code you write will vary depending on the underlying OS - which you've not stated.

I suspect that you could solve a large part of your performance issues by identifying which parts of the tree need to be checked and only checking those parts AND triggering the checks when the tree is updated, or at least marking the nodes as 'dirty'.

You might find these helpful:

http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/ http://en.wikipedia.org/wiki/Threaded_binary_tree

C.

3
votes

You could use a more efficient data structure, such as a btree. I used once in Java but not in PHP. You can try this script: http://www.phpclasses.org/browse/file/708.html, it is an implementation of btree.

If it is not enough, you can use Hadoop to implement a Map/Reduce pattern, as Michael said. I would not fork PHP process, it does not seem to help for performace.

Personally, I would use PHP as client and put everything in Hadoop. This tutorial might help: http://www.lunchpauze.com/2007/10/writing-hadoop-mapreduce-program-in-php.html.

Another solution can be to use a Java implementation of Btree: http://jdbm.sourceforge.net/. JDBM is an object database using a Btree+ data astructures. Then you can search with PHP by exposing data with a web service or by accessing it directly with Quercus

2
votes

Using web or CLI?

If you use web, you could intergrate that part in Quercus Then you could use the advantages of JAVA multithreading.

I don't actually know how reliable Quercus is though. I'd also suggest using a kind of message queue and refactoring the code, so it doesn't need the scope.

Maybe you could rebuild the code to a Map/Reduce pattern. You then can run the PHP code in Hadoop Then you can cluster the processing through a couple of machines.

I don't know if it's useful, but I came across another project, called Gearman. It's also used to cluster PHP processes. I guess you can combine that with a reduce script as well, if Hadoop is not the way you want to go.

0
votes

pthreads

There is a rather new (since 2012) PHP extension available: pthreads. It can be installed via PECL.

Simple Implementation in PHP Code: extend from Thread Class. Add a run() method and execute the start() method.

<?php
// Example from http://www.phpgangsta.de/richtige-threads-in-php-einfach-erstellen-mit-pthreads
class AsyncOperation extends Thread
{
    public function __construct($threadId)
    {
        $this->threadId = $threadId;
    }
 
    public function run()
    {
        printf("T %s: Sleeping 3sec\n", $this->threadId);
        sleep(3);
        printf("T %s: Hello World\n", $this->threadId);
    }
}
 
$start = microtime(true);
for ($i = 1; $i <= 5; $i++) {
    $t[$i] = new AsyncOperation($i);
    $t[$i]->start();
}
echo microtime(true) - $start . "\n";
echo "end\n";

Outputs

>php pthreads.php
0.041301012039185
end
T 1: Sleeping 3sec
T 2: Sleeping 3sec
T 3: Sleeping 3sec
T 4: Sleeping 3sec
T 5: Sleeping 3sec
T 1: Hello World
T 2: Hello World
T 3: Hello World
T 4: Hello World
T 5: Hello World