4
votes

I'm animating a convergence process that I'm simulating in an IPython 3.1 notebook. I'm visualizing the scatter plot result in a matplotlib animation, which I'm writing out to an animated gif via ImageMagick. There are 3000 frames, each with about 5000 points.

I'm not sure exactly how matplotlib creates these animation files, but it appears to cache up a bunch of frames and then write them out all together-- when I look at the CPU usage, it's dominated by python in the beginning and then by convert at the end.

Writing out the gif is happening exceedingly slowly. It's taking more than an hour to write out a 70MB file to an SSD on a modern MacBook Pro. 'convert' is taking the equivalent of 90% of one core on an 4 (8 hyperthread) core machine.

It takes about 15 minutes to write the first 65MB, and over 2 hours to write the last 5MB.

I think the interesting bits of the code follow-- if there's something else that would be helpful, let me know.

def updateAnim(i,cg,scat,mags):
    if mags[i]==0: return scat,
    cg.convergeStep(mags[i])
    scat.set_offsets(cg._chrgs[::2,0:2])
    return scat,

fig=plt.figure(figsize=(6,10))
plt.axis('equal')
plt.xlim(-1.2,1.2);plt.ylim(-1,3)
c=np.where(co._chrgs[::2,3]>0,'blue','red')
scat=plt.scatter(co._chrgs[::2,0],co._chrgs[::2,1],s=4,color=c,marker='o',alpha=0.25);
ani=animation.FuncAnimation(fig,updateAnim,frames=mags.size,fargs=(co,scat,mags),blit=True);
ani.save('Files/Capacitance/SpherePlateAnimation.gif',writer='imagemagick',fps=30);

Any idea what the bottleneck might be or how I might speed it up? I'd prefer the write out time be small compared to simulation time.

Version: ImageMagick 6.9.0-0 Q16 x86_64 2015-05-30 http://www.imagemagick.org Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC Features: DPC Modules Delegates (built-in): bzlib cairo djvu fftw fontconfig freetype gslib gvc jbig jng jp2 jpeg lcms lqr ltdl lzma openexr pangocairo png ps rsvg tiff webp wmf x xml zlib

ps -aef reports: convert -size 432x720 -depth 8 -delay 3.3333333333333335 -loop 0 rgba:- Files/Capacitance/SpherePlateAnimation.gif

1
I don't know matplotlib, but I am quite familiar with ImageMagick. Can you see how convert is being called - I mean what parameters it receives?Mark Setchell
You can hopefully see the parameters by running ps -aef | grep convert in TerminalMark Setchell
Added the convert call parameters to the question. Thanks for looking at this!Omegaman
FYI There are known issues with using blit = True on OSX. (matplotlib.1069221.n5.nabble.com/…)Orko

1 Answers

4
votes

Update

Please read the original answer below before doing anything suggested in this update.

If you want to debug this in some depth, you could separate the ImageMagick part out and identify for sure where the issue is. To do that, I would locate your ImageMagick convert program like this:

which convert    # result may be "/usr/local/bin/convert"

and then go to the containing directory, e.g.

cd /usr/local/bin

Now save the original convert program as convert.real - you can always change it back again later by reversing the last two parameters below:

mv convert convert.real

Now, save the following file as convert

#!/bin/bash
dd bs=128k > $HOME/plot.rgba 2> /dev/null

and make that executable by doing

chmod +x convert

Now, when you run matplotlib again, it will execute the script above rather than ImageMagick, and the script will save the raw RGBA data in your login directory in a file called plot.rgba. This will then tell you two things... firstly you will see if matplotlib now runs faster as there is no longer any ImageMagick processing, secondly you will see if the filesize is around 4GB like I am guessing.

Now you can use ImageMagick to process the file after matplotlib is finished using this, with a 10GB memory limit:

convert.real -limit memory 10000000 -size 432x720 -depth 8 -delay 3.33 -loop 0  $HOME/plot.rgba Files/Capacitance/SpherePlateAnimation.gif

You could also consider splitting the file into 2 (or 4), using dd and processing the two halves in parallel and appending them together to see if that helps. Ask, if you want to investigate that option.

Original Answer

I am kind of speaking out loud here in the hope that it either helps you directly or it jogs someone else's brain into grasping the problem...

It seems from the commandline you have shared that matplotlib is writing directly to the stdin of ImageMagick's convert tool - I can see that from the RGBA:- parameter that tells me it is sending RGB plus Alpha transparency as raw values on stdin.

That means that there are no intermediate files that I can suggest placing on a RAM-disk, which is where I was heading to with my comment...

The second thing is that, as the raw pixel data is being sent, every single pixel is computed and sent by matplotlib so it is invariant with the 5,000 points in your simulation - so no point reducing or optimising the number of points.

Another thing to note is that you are using the 16-bit quantisation version of ImageMagick (Q16 in your version string). That effectively doubles the memory requirement, so if you can easily recompile ImageMagick for an 8-bit quantum depth, that may help.

Now, let's look at that input stream, RGBA -depth 8 means 4 bytes per pixel and 432x720 pixels per frame, or 1.2MB per frame. Now, you have 3,000 frames, so that makes 3.6GB minimum, plus the output file of 75MB. I suspect that this is just over the limit of ImageMagick's natural memory limit and that is why it slows down at the end, so my suggestion would be to check the memory limits on ImageMagick and consider increasing them to 4GB-6GB or more if you have it.

To check the memory and other resource limits:

identify -list resource

Resource limits:
  Width: 214.7MP
  Height: 214.7MP
  Area: 4.295GP
  Memory: 2GiB    <---
  Map: 4GiB
  Disk: unlimited
  File: 192
  Thread: 1
  Throttle: 0
  Time: unlimited

As you cannot raise the memory limits on the commandline that matplotlib executes, you could do it via an environment variable that you export prior to starting matplotlib like this:

export MAGICK_MEMORY_LIMIT=4294967296

identify -list resource

Resource limits:
  Width: 214.7MP
  Height: 214.7MP
  Area: 4.295GP
  Memory: 4GiB    <---
  Map: 4GiB
  Disk: unlimited
  File: 192
  Thread: 1
  Throttle: 0
  Time: unlimited

You can also change it in your policy.xml file, but that is more involved, so please try this way initially and ask if you get stuck!

Please pass feedback on this as I may be able to suggest other things depending on whether this works. Please also run identify -list configure and edit your question and paste the output there.