4
votes

I am using Delphi 2009 with Unicode strings.

I'm trying to Encode a very large file to convert it to Unicode:

var
  Buffer: TBytes;
  Value: string;

Value := Encoding.GetString(Buffer);

This works fine for a Buffer of 40 MB that gets doubled in size and returns Value as an 80 MB Unicode string.

When I try this with a 300 MB Buffer, it gives me an EOutOfMemory exception.

Well, that wasn't totally unexpected. But I decided to trace it through anyway.

It goes into the DynArraySetLength procedure in the System unit. In that procedure, it goes to the heap and calls ReallocMem. To my surprise, it successfully allocates 665,124,864 bytes!!!

But then towards the end of DynArraySetLength, it calls FillChar:

  // Set the new memory to all zero bits
  FillChar((PAnsiChar(p) + elSize * oldLength)^, elSize * (newLength - oldLength), 0);

You can see by the comment what that is supposed to do. There is not much to that routine, but that is the routine that causes the EOutOfMemory exception. Here is FillChar from the System unit:

procedure _FillChar(var Dest; count: Integer; Value: Char);
{$IFDEF PUREPASCAL}
var
  I: Integer;
  P: PAnsiChar;
begin
  P := PAnsiChar(@Dest);
  for I := count-1 downto 0 do
    P[I] := Value;
end;
{$ELSE}
asm                                  // Size = 153 Bytes
        CMP   EDX, 32
        MOV   CH, CL                 // Copy Value into both Bytes of CX
        JL    @@Small
        MOV   [EAX  ], CX            // Fill First 8 Bytes
        MOV   [EAX+2], CX
        MOV   [EAX+4], CX
        MOV   [EAX+6], CX
        SUB   EDX, 16
        FLD   QWORD PTR [EAX]
        FST   QWORD PTR [EAX+EDX]    // Fill Last 16 Bytes
        FST   QWORD PTR [EAX+EDX+8]
        MOV   ECX, EAX
        AND   ECX, 7                 // 8-Byte Align Writes
        SUB   ECX, 8
        SUB   EAX, ECX
        ADD   EDX, ECX
        ADD   EAX, EDX
        NEG   EDX
@@Loop:
        FST   QWORD PTR [EAX+EDX]    // Fill 16 Bytes per Loop
        FST   QWORD PTR [EAX+EDX+8]
        ADD   EDX, 16
        JL    @@Loop
        FFREE ST(0)
        FINCSTP
        RET
        NOP
        NOP
        NOP
@@Small:
        TEST  EDX, EDX
        JLE   @@Done
        MOV   [EAX+EDX-1], CL        // Fill Last Byte
        AND   EDX, -2                // No. of Words to Fill
        NEG   EDX
        LEA   EDX, [@@SmallFill + 60 + EDX * 2]
        JMP   EDX
        NOP                          // Align Jump Destinations
        NOP
@@SmallFill:
        MOV   [EAX+28], CX
        MOV   [EAX+26], CX
        MOV   [EAX+24], CX
        MOV   [EAX+22], CX
        MOV   [EAX+20], CX
        MOV   [EAX+18], CX
        MOV   [EAX+16], CX
        MOV   [EAX+14], CX
        MOV   [EAX+12], CX
        MOV   [EAX+10], CX
        MOV   [EAX+ 8], CX
        MOV   [EAX+ 6], CX
        MOV   [EAX+ 4], CX
        MOV   [EAX+ 2], CX
        MOV   [EAX   ], CX
        RET                          // DO NOT REMOVE - This is for Alignment
@@Done:
end;
{$ENDIF}

So my memory was allocated, but it crashed trying to fill it with zeros. This doesn't make sense to me. As far as I'm concerned, the memory doesn't even need to be filled with zeros - and that is probably a time waster anyhow - since the Encoding statement is about to fill it anyway.

Can I somehow prevent Delphi from doing the memory fill?

Or is there some other way I can get Delphi to allocate this memory successfully for me?

My real goal is to do that Encoding statement for my very large file, so any solution that will allow this would be much appreciated.


Conclusion: See my comments on the answers.

This is a warning to be careful in debugging assembler code. Make sure you break on all the "RET" lines, since I missed the one in the middle of the FillChar routine and erroneously concluded that FillChar caused the problem. Thanks Mason, for pointing this out.

I will have to break the input into Chunks to handle the very large file.

4

4 Answers

6
votes

FillChar isn't allocating any memory, so that's not your problem. Try tracing into it and placing breakpoints at the RET statements, and you'll see that the FillChar finishes. Whatever the problem is, it's probably in a later step.

5
votes

Read a chunk from the file, encode and write to another file, repeat.

1
votes

A wild guess: Could the problem be memory being overcommitted and when the FillChar actually accesses the memory it can't find a page to actually give you? I don't know if Windows will even overcommit memory, I do know that some OSes do--you don't find out about it until you actually try to make use of the memory.

If this is the case it could cause the blowup in FillChar.

1
votes

Programs are great at looping. They loop tirelessly without complaining.

Allocating a huge amount of memory takes time. There will be many calls to the heap manager. Your OS won't even know if it has the amount of contiguous memory that you need ahead of time. Your OS says, yeah, I have 1 GB free. But as soon as you go to use it, your OS says, wait, you want all of it in one chunk? Let me make sure I have enough all in one place. If it doesn't you get the error.

If it does have the memory, well, there's still a lot of work for the heap manager in preparing the memory and marking it as used.

So, obviously, it makes some sense to allocate less memory and simply loop through it. This saves the computer from doing a lot of work that it will only have to undo when it's done. Why not have it do just a little bit of work in setting aside your memory, then just keep re-using it?

Stack memory is allocated much faster than heap memory. If you keep your memory usage small (under 1 MB, by default), the compiler may just use stack memory over heap memory, which will make your loops even faster. In addition, local variables that get allocated in the register are very fast.

There are factors such as hard drive cluster and cache sizes, CPU cache sizes, and things, that offer hints about the best chunk sizes. The key is to find a good number. I like to use 64 KB chunks.