2
votes

In my project, I periodically use pickling to represent the internal state of the process for persistence. As a part of normal operation, references to objects are added to and removed from multiple other objects.

For example Person might have an attribute called address_list (a list) that contains the Address objects representing all the properties they are trying to sell. Another object, RealEstateAgent, might have an attribute called addresses_for_sale (also a list) which contains the same type of Address objects, but only those ones that are listed at their agency.

If a seller takes their property off the market, or it is sold, the Address is removed from both lists.

Both Persons and RealEstateAgents are members of a central object (Masterlist) list for pickling. My problem is that as I add and remove properties and pickle the Masterlist object repeatedly over time, the size of the pickle file grows, even when I have removed (del actually) more properties than I have added. I realize that, in pickling Masterlist, there is a circular reference. There are many circular references in my application.

I examined the pickle file using pickletools.dis(), and while it's hard to human-read, I see references to Addresses that have been removed. I am sure they are removed, because, even after unpickling, they do not exist in their respective lists.

While the application functions correctly before and after pickling/unpickling, the growing filesize is an issue as the process is meant to be long running, and reinitializing it is not an option.

My example is notional, and it might be a stretch to ask, but I'm wondering if anyone has experience with either garbage collection issues using pickles, when they contain circular references or anything else that might point me in the right direction to debugging this. Maybe some tools that would be helpful.

Many thanks

1
I ultimately used objgraph and a heapy to find the source of the leak - many thanks. Oh, and the references were actually still in the object once loaded into memory, just not where I expected them. This is a great tool for complex object relationships! - domoarigato

1 Answers

2
votes

You might want to try objgraph… it can seriously aid you in tracking down memory leaks and circular references and pointer relationships between objects.

http://mg.pov.lt/objgraph/

I use it when debugging pickles (in my own pickling package called dill).

Also, certain pickled objects will (down the pickle chain) pickle globals, and is often a cause of circular references within pickled objects.

I also have a suite of pickle debugging tools in dill. See dill.detect at https://github.com/uqfoundation, where there are several methods that can be used to diagnose objects you are tying to pickle. For instance, if you set dill.detect.trace(True), it will print out all the internal calls to pickle objects while your object is being dumped.