4
votes

I am working on an older MFC/C++ project that parses large text files using MFC's CString class to handle strings. I noticed that during the parsing process there's a lot of adding of small parts to an overall large CString object as such:

//'strContainer' = CString
//'tag' = CString of a much smaller size
strContainer += L"<" + tag + L">";

The operator above seems to be slowing the overall performance of CString when strContainer variable reaches a certain larger size. I'm supposing that such happens because of the often re-allocation of memory done by the += operator.

So I was curious, is there any way to improve this?

PS1. I do not know the size of the result string up front to pre-allocate it.

PS2. I have to stick with CString due to complexity of the project itself. (Or, I can't switch to Boost or other newer implementations.)

1
Actually, the += is probably the fast part. The performance issue is probably the + of the temporaries. As such, do += three times and see if that makes a difference. - Mooing Duck
@MooingDuck: The issue with measuring this performance is that by itself a single operator would not be something that a human can notice. It's when I run several thousands of them in the parsing method, that's when it becomes visible. In either case, I'm positive that it's one of the CString operators that does memory re-allocation. I conclude it because its performance decreases exponentially with the size of CString variable. - c00000fd
Even if you can't preallocate an optimal amount, you could still preallocate enough to cover a majority of your cases. Memory is probably a lot less of a concern than when the program was first written. - Mark Ransom
Is this code compiled with optimisation (not debug mode?). I have written code that deals with seriously large strings (admittedly not CString, but I doubt it is THAT inefficient). In debug builds, it's almost certainly horribly slow because of all sorts of extra checking. - Mats Petersson
@c00000fd: Just call Preallocate before you start appending, and give it a number that's big enough for 95% of your data. Super easy. (The other 5% it will reallocate like it did before, but usually only once, not 10 times) - Mooing Duck

1 Answers

5
votes

With std::string, += is usually quite fast as it can just copy bytes into already allocated buffers. Usually, the L"<" + tag + L">"; will require three or more memory allocations, which are completely unnecessary, if you simply replace that line of code with three +=. Additionally, allocations are REALLY REALLY SLOW if you have Visual Studio start the program for you, even release builds. Run your program manually without Visual Studio, and see if that solves your performance problems.

I dug into the MFC source to very this. (And dug and dug and dug...) and found that ATL::CSimpleStringT::PrepareWrite2(int nLength) will grow exponentially (1.5x bigger each allocation, completely normal, std::string is the same, except...
If the MFC string is over 1G, it only adds 1M each allocation after that.

So there's two conditions: If strContainer is over 1G, you should manually reserve memory (Preallocate a large number of bytes. It doesn't have to be exact, or even greater than the real number.).
Otherwise, simply replace the + with +=.