2
votes

I am running a legacy application, built on Delphi2007, where we used to handle non-English characters by storing 2byte Hex code of the character in the DB. While reading we apply char() to convert these Hex code to String.

String to Hex (before saving to DB):

strHex := Format( '%x', [ Byte( strText[ lIndex ] ) shr 4 ] );
DataStr[ lPos ] := strHex[ 1 ];
inc( lPos );

strHex := Format( '%x', [ Byte( strText[ lIndex ] ) and $0F ] );
DataStr[ lPos ] := strHex[ 1 ];
inc( lPos );

//in simple I am saving the Hex code to pcData

Hex to String (after reading from DB):

strText := strText + Chr( StrToInt('$'+ DataStr[lPos] + DataStr[lPos + 1]))

This code started breaking after moving to Delphi XE7, where string is treated as UniCode String, we explicitly have to convert the string to AnsiString type.

Converting below string to hex
ТуцЕфылАшдеук8311
In Delphi 2007 gives:
\D2\F3\F6\C5\F4\FB\EB\C0\F8\E4\E5\F3\EA8311
In Delphi XE7 gives:
\22\43\46\1A\33\4B\4B\48\44\42\14\44\49\33\351522


I would like to know the best way I can modify this code such that I can handle my legacy data.

2
This is exceedingly weird. A very strange encoding that you invented. Did your database only support 7 bit ASCII? Why not have used something standard like UTF-8? What are your goals now? Do you want to read from and write to this strange encoding that you invented? Or just read from it. Don't you want to use Unicode? - David Heffernan
Agreed, this is a solution of madness. From the data you've shown, the original (ANSI) strings were encoded using codepage 1251. You'll need to define an AnsiString of this type and use it when retrieving data from your database. I would strongly suggest that you then immediately convert it to a normal (unicode) string when using it in your application, converting back only to store in the DB. Better yet, update your DB to unicode and get rid of this mess altogether. See : stackoverflow.com/a/7222871/327083 - J...
Our application runs on both SqlServer and Oracle, we need to stick to this code to support data that was migrated from older versions. We know moving to UTF8 is the solution and have implemented the same, but I would have to write code to handle the data that is entered from older versions too. - user1897277
You didn't answer any of my questions. - David Heffernan
Ya my application needs to read this for backward compatibility, in case one edits and saves, we convert the encoding to UTF8, it is a separate flow I did not mention it here. - user1897277

2 Answers

2
votes

First, the simpler way to generate the hex string would have been to use the RTL's own BinToHex() function instead of writing your own conversion code, eg:

var
  ...
  s: AnsiString;
  DataStr: string; 
  lPos: Integer;
  ...
begin
  ...
  s := '...';
  BinToHex(PAnsiChar(s), @DataStr[lPos], Length(s)); 
  Inc(lPos, Length(s)*2);
  ...
end;

Then, you can use HexToBin() to reverse it. And since you are dealing with encoded ANSI data, you can declare an AnsiString variable that has an affinity for the desired codepage encoding (in your case, probably 1251), read the hex code directly into that variable, and then assign/cast it to a normal String and let the RTL handle the conversion to Unicode for you:

type
  Win1251String = type AnsiString(1251);
var
  ...
  tmp: Win1251String;
  DataStr, strText: string;
  lPos: Integer;
  ...
begin
  ...
  SetLength(tmp, LengthOfHex div 2);
  HexToBin(@DataStr[lPos], PAnsiChar(tmp), Length(tmp));
  strText := String(tmp);
  ...
end;
1
votes

According to comments, you just need to decode this data to a native Unicode string. Do that like so:

  1. Read the encoded text from the database into a string variable.
  2. Decode that text into a byte array rather than a string. Your Delphi 2007 code can be used pretty much as it, but it needs to write to a byte array rather than a string.
  3. That byte array is ANSI 1251 encoded. Decode it with TEncoding.GetString. You'll need to create an instance of the TEncoding class with the correct code page, Encoding := TEncoding.GetEncoding(1251).