5
votes

I maintain a Delphi program which uses typed binary files as its native file format. After upgrading from Turbo Delphi to Delphi 2010, all chars in the record type being stored started being stored with 2 bytes rather than one.

The data types being stored are char and array[1..5] of char.

So before, part of the file looked like:

4C 20 20 20 4E 4E 4E 4E

Now it looks like:

4C 00 20 00 20 00 20 00 4E 00 4E 00 4E 00 4E 00

First of all, why did this happen in the first place?

Secondly, how can I still read my files, keeping in mind that there are now old files and new files floating around in the universe?

I will monitor this question obsessively after lunch. Feel free to ask for more information in comments.

2
Another question: how can such a file format error be avoided in the future? - I highly recommend unit tests, for example with the open source DUnit test framework. - mjn
unit tests don't protect against this. File headers and version numbers do! - Cosmin Prund
@Cosmin: why should a regression test like CheckFileEquals(Expected, Actual) not be helpful? - mjn

2 Answers

11
votes

This happened when the default string type was changed from AnsiString to UnicodeString in Delphi 2009. Sounds like you were writing strings to the file. Redeclare them in the record as AnsiString and it should work fine.

Same goes for char. The original char was an AnsiChar, one byte per character. Now the default char is a WideChar, which is a UTF-16 char, 2 bytes per character. Redeclare your char arrays as arrays of AnsiChar and you'll get your old file style back.

As for being aware that both styles exist, that's a mess. Unless there's something like a version number in the file that's been changed when you upgraded your Delphi version, I suppose the only thing you can do is scan for 00 bytes in the character data and then have it read in either a AnsiChar or a WideChar version of the record based on whether it finds it.

0
votes

In your code, change the string type declaration to AnsiString, and char type declaration to AnsiChar. It will use the same encoding than with previous version of Delphi. And AnsiString/AnsiChar types work also with previous versions of Delphi. But there is no global compiler switch. Then convert this AnsiString/AnsiChar to unicode string.

Here are two examples, doing the same thing, one using an array of AnsiChar, one with direct reading of an AnsiString content. Both return a generic Unicode string:

function Read5(S: Stream): string;
var chars: array[1..5] of AnsiChar;
    tmp: AnsiString;
    i: integer;
begin
  S.Read(chars,5);
  for i := 1 to 5 do
    tmp := tmp+chars[i];
  result := string(tmp);
end;


function Read5(S: Stream): string;
var tmp: AnsiString;
begin
  SetLength(tmp,5);
  S.Read(tmp[1],5);
  result := string(tmp);
end;

You can use AnsiChars in all your program, without any problem.

But you may have some problems if your AnsiChars are used in string functions (like pos or copy).

Always take a close look to Delphi 2010 compiler warnings, and try to avoid any implicit ansi-unicode conversion by making them explicit.