4
votes

I have inherited an old Borland C++ Builder application which I now must migrate to a new development tool. The suggested way to go is with Embarcadero C++ Builder, and from my initial tests it seems like a rather smooth transition.

I do however have one problem to which I'm hoping there is a simple solution:

The application parses a large amount of text files. These files are all ANSI based, and that will never change, so it is ANSI in and ANSI out. The main problem I have is that with Embarcadero C++, the type string is now a UnicodeString instead of an AnsiString (as it was in Borland C++ Builder).

Using Unicode in this application is not an option - the files it work with are ANSI formatted. Modifying the code to use AnsiString (and similar) is doable, but i'd rather not since it uses a lot of TStringList (and similar) constructs.

So my question is: Is there a setting or compiler option or something that I can use to tell Embarcadero to use System.AnsiString as definition for string instead of System.UnicodeString?

This is probably a long-shot, but the RAD Studio XE (which is the older version that I have borrowed to make a few tests) documentation says "by default, the type string is now a Unicode string", which implies that this can be changed. That is however rephrased in the documentation for the current version (XE8), so...

3
There's nothing wrong with using the UnicodeString for this purpose. UnicodeString and AnsiString are both specializations of the template AnsiStringT, where the template parameter is the code page.M.M

3 Answers

6
votes

I have inherited an old Borland C++ Builder application which I now must migrate to a new development tool. The suggested way to go is with Embarcadero C++ Builder

Yes. They are actually the same product. Borland created a child company named CodeGear to manage its developer tools (Delphi, C++Builder, etc), and then Embarcadero later bought CodeGear.

The main problem I have is that with Embarcadero C++, the type string is now a UnicodeString instead of an AnsiString (as it was in Borland C++ Builder).

string (lowercase s) refers to the STL's std::string class, which is still char-based. You are thinking of C++Builder's System::String alias, which does now map to System::UnicodeString instead of System::AnsiString (that change was made in C++Builder 2009, when UnicodeString was introduced). However, AnsiString still exists and can be used directly.

Using Unicode in this application is not an option - the files it work with are ANSI formatted.

Then don't use UnicodeString to process them. Continue using AnsiString instead.

Modifying the code to use AnsiString (and similar) is doable, but i'd rather not since it uses a lot of TStringList (and similar) constructs.

That, on the other hand, would be a problem, yes. Most of the RTL only supports UnicodeString now. So code using TStringList will have to be re-written, such as by using TList<AnsiString> or std::vector<AnsiString> instead (unless the code is utilizing the TStringList::(Comma|Delimited)Text properties, in which case you have a bigger re-write). However, for AnsiString parsing code, many of the older AnsiString-based RTL functions were moved to a separate System.AnsiStrings unit, so you can add #include <System.AnsiStrings.hpp> to your code to reach them.

So my question is: Is there a setting or compiler option or something that I can use to tell Embarcadero to use System.AnsiString as definition for string instead of System.UnicodeString?

No. And if you think about it, that would be a major undertaking for them to implement. Multiple copies of the RTL/VCL/FMX frameworks, 2 for each supported OS platform. And a lot of internal code would have to be IFDEF'ed to handle differences between Ansi/Unicode processing logic. So not really feasible or cost-effective for them to do (and much too late at this point, especially considering that AnsiString is not supported on mobile OS platforms - though there is a 3rd party patch available to re-enable it).

This is probably a long-shot, but the RAD Studio XE (which is the older version that I have borrowed to make a few tests) documentation says "by default, the type string is now a Unicode string", which implies that this can be changed.

No, it cannot by changed. The RTL/VCL/FMX frameworks are Unicode now. But that does not require that your code must be Unicode as well. Only in the spots where you need to directly interact with the RTL/VCL/FMX. The rest of your code can continue using AnsiString (or even std::string) as needed.

1
votes

Probably I've got bad news. They always talk about migration, nowhere about quick fix.

http://docwiki.embarcadero.com/RADStudio/XE3/en/Enabling_Applications_for_Unicode http://docwiki.embarcadero.com/RADStudio/XE3/en/Enabling_C%2B%2B_Applications_for_Unicode

Well... I hate Strings in Borland. Who the hell came up with to number them from 1 instead from 0?!

0
votes

AnsiString-s can be converted into UnicodeString-s easily. This is how I handled the conversion. Old C++Builder 2007 code:

void __fastcall TFormVidya::lbEntData(TWinControl *Control, int Index, AnsiString &Data)
{
    if(FEntNameSto) {
        char *pc;
        int len=FEntNameSto->PeekValue(Index,&pc);
        Data.printf("DB %.*s",len,pc);
    } else Data.sprintf("MOCK %d!",Index);
}

Converted for C++Builder XE2:

void __fastcall TFormVidya::lbEntData(TWinControl *Control, int Index, UnicodeString &Data)
{
    if(FEntNameSto) {
        char *pc;
        int len=FEntNameSto->PeekValue(Index,&pc);
        AnsiString astr;
        astr.printf("DB %.*s",len,pc);
        Data=astr;
    } else Data.sprintf(L"MOCK %d!",Index);
}

The essence is the assignement of an AnsiString to an UnicodeString: Data=astr;.

Also, the help page ms-help://embarcadero.rs_xe2/libraries/System.UnicodeString.html (the one that says "By default, variables declared as type String are UnicodeString."), also says "Despite its name, UnicodeString can represent both ANSI character set strings and Unicode strings.", but I could not make any use of it.