4
votes

I have a large XML file to parse in code like the following sample. The issue seems to be that the memory allocated to childnode (IXMLNode) is not released, even when childnode falls out of scope. The memory only seems to be released once the parent TXMLDocument is deactivated (Active:=false), or freed. So my code, which starts around 380Mb once the xml document is loaded, blows out to 2Gb and that's where it ends. Setting childnode to nil has no effect on memory usage.

My question is how to explicitly release the memory allocated to the IXMLNode interfaces. I'm not open to using a different XML object and I think I've tried almost every way to control the scope of the node interfaces.

var
  childnode: IXMLNode;

for i:=0 to rootnode.ChildNodes.Count-1 do begin
    childnode:=rootnode.ChildNodes[i];
    ...
    childnode:=nil;
end;
1
Do you have an edition of Delphi xe that includes AQTime? If so it will tell you exactly what is using up the memory.Warren P
Are you using the Open XML implementation? Of the three implementations (Open XML, Xerces XML, and MSXML) that are included out of the box, the Open XML implementation is probably the fastest, but uses roughly about twice as much memory as the other two. That said, there will always be a limit to how big a document you can load with TXmlDocument, because it essentially loads the whole thing. For a really large document you might need to go with a SAX parser (non-TXmlDocument) solution.Misha
In the past I solved the memory management issues, using XPath to locate a parse a group of nodes, instead of iterate over all the nodes at once.RRUZ
@Gerard, there is a small memory leak in the latest Open XML implementation, but it is not per node. The other implementations do not leak memory. Perhaps reading the ChildNode creates a cached reference in memory that is only freed when the document is closed? This would make sense and ensure fast access to a node that had already been read.Misha
SAX is the way to goDavid Heffernan

1 Answers

2
votes

i know you said you didn't want a separate XML library; but maybe someone else would like the sample code:

var
   sax: SAXXMLReader60;
   stm: IStream;
begin
   //Get a stream around our large file
   stm := TStreamAdapter.Create(TFileStream.Create('USGovBudgetLineItems2008.xml', fmOpenRead   ));

   sax := CoSAXXMLReader60.Create;
   sax.contentHandler := TVBSAXContentHandler.Create;
   sax.parse(stm);
end;

And we listen for the events with our SAXContentHandler object.

For all the IDispatch events you can return E_NOTIMPL (msxml doesn't even call them).

All the rest you can plug in whatever code you want:

TVBSAXContentHandler = class(TInterfacedObject, IVBSAXContentHandler)
protected
    { IDispatch }
    function GetTypeInfoCount(out Count: Integer): HResult; stdcall;
    function GetTypeInfo(Index, LocaleID: Integer; out TypeInfo): HResult; stdcall;
    function GetIDsOfNames(const IID: TGUID; Names: Pointer; NameCount, LocaleID: Integer; DispIDs: Pointer): HResult; stdcall;
    function Invoke(DispID: Integer; const IID: TGUID; LocaleID: Integer; Flags: Word; var Params; VarResult, ExcepInfo, ArgErr: Pointer): HResult; stdcall;
public
    { IVBSAXContentHandler }
    procedure Set_documentLocator(const Param1: IVBSAXLocator); safecall;
    procedure startDocument; safecall;
    procedure endDocument; safecall;
    procedure startPrefixMapping(var strPrefix: WideString; var strURI: WideString); safecall;
    procedure endPrefixMapping(var strPrefix: WideString); safecall;
    procedure startElement(var strNamespaceURI: WideString; var strLocalName: WideString;
                                var strQName: WideString; const oAttributes: IVBSAXAttributes); safecall;
    procedure endElement(var strNamespaceURI: WideString; var strLocalName: WideString;
                             var strQName: WideString); safecall;
    procedure characters(var strChars: WideString); safecall;
    procedure ignorableWhitespace(var strChars: WideString); safecall;
    procedure processingInstruction(var strTarget: WideString; var strData: WideString); safecall;
    procedure skippedEntity(var strName: WideString); safecall;
//      property documentLocator: IVBSAXLocator write Set_documentLocator;
end;

Note: Any code is released into the public domain. No attribution required.