46
votes

I have a lot of trouble with the combination of symfony2 and doctrine2. I have to deal with huge datasets (around 2-3 million write and read) and have to do a lot of additional effort to avoid running out of memory.

I figgured out 2 main points, that "leak"ing memory (they are actually not really leaking, but allocating a lot).

  1. The Entitymanager entity storage (I don't know the real name of this one) it seems like it keeps all processed entities and you have to clear this storage regularly with

    $entityManager->clear()
  2. The Doctrine QueryCache - it caches all used Queries and the only configuration I found was, that you are able to decide what kind of Cache you wanna use. I didn't find a global disable neither a useful flag for each query to disable it. So usually I disable it for every query object with the function

     $qb = $repository->createQueryBuilder($a);
     $query = $qb->getQuery();
     $query->useQueryCache(false);
     $query->execute();
     

So.. that's all I figured out right now.. My questions are:

Is there a easy way to deny some objects from the Entitymanagerstorage? Is there a way to set the querycache use in the entitymanager? Can I configure this caching behaviors somewhere in the Symfony/doctrine configuration?

Would be very cool if someone has some nice tips for me.. otherwise this may help some rookie..

cya

8
The D2 ORM layer is not really designed for massive batch processing. You might be better off using the DBAL layer and just working with arrays. - Cerad
Running with --no-debug helps a lot (in debug mode the profiler retains informations about every single query in memory) - Arnaud Le Blanc

8 Answers

89
votes

As stated by the Doctrine Configuration Reference by default logging of the SQL connection is set to the value of kernel.debug, so if you have instantiated AppKernel with debug set to true the SQL commands get stored in memory for each iteration.

You should either instantiate AppKernel to false, set logging to false in you config YML, or either set the SQLLogger manually to null before using the EntityManager

$em->getConnection()->getConfiguration()->setSQLLogger(null);
18
votes

Try running your command with --no-debug. In debug mode the profiler retains informations about every single query in memory.

15
votes

1. Turn off logging and profiling in app/config/config.yml

doctrine:
    dbal:
        driver: ...
        ...
        logging: false
        profiling: false

or in code

$this->entityManager->getConnection()->getConfiguration()->setSQLLogger(null);

2. Force garbage collector. If you actively use CPU then garbage collector waits and you can find yourself with no memory soon.

At first enable manual garbage collection managing. Run gc_enable() anywhere in the code. Then run gc_collect_cycles() to force garbage collector.

Example

public function execute(InputInterface $input, OutputInterface $output)
{
    gc_enable();

    // I'm initing $this->entityManager in __construct using DependencyInjection
    $customers = $this->entityManager->getRepository(Customer::class)->findAll();

    $counter = 0;
    foreach ($customers as $customer) {
        // process customer - some logic here, $this->em->persist and so on

        if (++$counter % 100 == 0) {
            $this->entityManager->flush(); // save unsaved changes
            $this->entityManager->clear(); // clear doctrine managed entities
            gc_collect_cycles(); // PHP garbage collect

            // Note that $this->entityManager->clear() detaches all managed entities,
            // may be you need some; reinit them here
        }
    }

    // don't forget to flush in the end
    $this->entityManager->flush();
    $this->entityManager->clear();
    gc_collect_cycles();
}

If your table is very large, don't use findAll. Use iterator - http://doctrine-orm.readthedocs.org/projects/doctrine-orm/en/latest/reference/batch-processing.html#iterating-results

10
votes
  1. Set SQL logger to null

$em->getConnection()->getConfiguration()->setSQLLogger(null);

  1. Manually call function gc_collect_cycles() after $em->clear()

$em->clear(); gc_collect_cycles();

Don't forget to set zend.enable_gc to 1, or manually call gc_enable() before use gc_collect_cycles()

  1. Add --no-debug option if you run command from console.
4
votes

got some "funny" news from doctrine developers itself on the symfony live in berlin - they say, that on large batches, us should not use an orm .. it is just no efficient to build stuff like that in oop

.. yeah.. maybe they are right xD

3
votes

As per the standard Doctrine2 documentation, you'll need to manually clear or detatch entities.

In addition to that, when profiling is enabled (as in the default dev environment). The DoctrineBundle in Symfony2 configures a several loggers use quite a bit of memory. You can disable logging completely, but it is not required.

An interesting side effect, is the loggers affect both Doctrine ORM and DBAL. One of loggers will result in additional memory usage for any service that uses the default logger service. Disabling all of these would be ideal in commands-- since the profiler isn't used there yet.

Here is what you can do to disable the memory-intense loggers while keeping profiling enabled in other parts of Symfony2:

$c = $this->getContainer();
/* 
 * The default dbalLogger is configured to keep "stopwatch" events for every query executed
 * the only way to disable this, as of Symfony 2.3, Doctrine Bundle 1.2, is to reinistiate the class
 */

$dbalLoggerClass = $c->getParameter('doctrine.dbal.logger.class');
$dbalLogger = new $dbalLoggerClass($c->get('logger'));   
$c->set('doctrine.dbal.logger', $dbalLogger);

// sometimes you need to configure doctrine to use the newly logger manually, like this
$doctrineConfiguration = $c->get('doctrine')->getManager()->getConnection()->getConfiguration();
$doctrineConfiguration->setSQLLogger($dbalLogger);

/*
 * If profiling is enabled, this service will store every query in an array
 * fortunately, this is configurable with a property "enabled"
 */
if($c->has('doctrine.dbal.logger.profiling.default'))
{
    $c->get('doctrine.dbal.logger.profiling.default')->enabled = false;
}

/*
 * When profiling is enabled, the Monolog bundle configures a DebugHandler that 
 * will store every log messgae in memory. 
 *
 * As of Monolog 1.6, to remove/disable this logger: we have to pop all the handlers
 * and then push them back on (in the correct order)
 */
$handlers = array();
try
{   
    while($handler = $logger->popHandler())
    {
        if($handler instanceOf \Symfony\Bridge\Monolog\Handler\DebugHandler)
        {
            continue;
        }
        array_unshift($handlers, $handler);
    }
}
catch(\LogicException $e)
{
    /*
     * As of Monolog 1.6, there is no way to know if there's a handler
     * available to pop off except for the \LogicException that's thrown.
     */
    if($e->getMessage() != 'You tried to pop from an empty handler stack.')
    {
        /*
         * this probably doesn't matter, and will probably break in the future
         * this is here for the sake of people not knowing what they're doing
         * so than an unknown exception is not silently discarded.
         */

        // remove at your own risk
        throw $e;
    }
}

// push the handlers back on
foreach($handlers as $handler)
{
    $logger->pushHandler($handler);
}
0
votes

Try disabling any Doctrine caches that exist. (If you're not using APC / other as a cache then memory is used).

Remove Query Cache

$qb = $repository->createQueryBuilder($a);
$query = $qb->getQuery();
$query->useQueryCache(false);
$query->useResultCache(false);
$query->execute();

There's no way to globally disable it

Also this is an alternative to clear that might help (from here)

$connection = $em->getCurrentConnection();
$tables = $connection->getTables();
foreach ( $tables as $table ) {
    $table->clear();
}
0
votes

I just posted a bunch of tips for using Symfony console commands with Doctrine for batch processing here.