1
votes

I'm slowly converting my existing code into Delphi 2010 and read several of the articles on Embarcaedro web site as well as Marco Cantú whitepaper.

There are still some things I haven't understood, so here are two functions to exemplify my question:

function RemoveSpace(InStr: string): string;
var
  Ans     : string;
  I       : Word;
  L       : Word;
  TestChar: string[1];
begin
  Ans := '';
  L := Length(InStr);
  if L > 0 then
  begin
    for I := 1 to L do
    begin
      TestChar := Copy(InStr, I, 1);
      if TestChar <> ' ' then Ans := Ans + TestChar;
    end;
  end;
  RemoveSpace := Ans;
end;

function ReplaceStr(const S, Srch, Replace: string): string;
var
  I: Integer;
  Source: string;
begin
  Source := S;
  Result := '';
  repeat
    I := Pos(Srch, Source);
    if I > 0 then begin
      Result := Result + Copy(Source, 1, I - 1) + Replace;
      Source := Copy(Source, I + Length(Srch), MaxInt);
    end
    else Result := Result + Source;
  until I <= 0;
end;

For the RemoveSpace function, if no unicode character is passed ('aa bb' for example), all is well. Now if I pass the text 'ab cd' then the function doesn't work as expected (I get ab??cd as the output).

How can I account for possible unicode characters on a string? using Length(InStr) is obviously incorrect as well as Copy(InStr, I, 1).

What's the best way of converting this code so that it accounts for unicode characters?

Thanks!

5
Huh? Word as a type for the Length and Index of strings? In production code since Delphi 3? Besides the solution by Aldo, I'd suggest you make sure that your code compiles without warnings (even better: without hints).Jeroen Wiert Pluimers
Jeroen, I see nothing in this code that would generate a warning or a hint. Assigning an Integer to a Word never generates a message; it could lead to a range-check error on strings longer than 65535 characters, though. It's common to want to use unsigned types for variables that can never be negative (such as string lengths and indices), and in Delphi 3, Word was the largest unsigned type available. (Truly unsigned Cardinal only came in Delphi 4, with the introduction of Int64.)Rob Kennedy

5 Answers

14
votes

If those were your REAL functions and you're just trying to get em working then :

function RemoveSpace(const InStr: string): string;
begin
  Result := StringReplace(InStr, ' ', '', [rfReplaceAll]); 
end;

function ReplaceStr(const S, Srch, Replace: string): string;
begin
  Result := StringReplace(S, Srch, Replace, [rfReplaceAll, rfIgnoreCase]); 
end;
1
votes

(we do not use D10, at the moment, so beware!)

The problem in Delphi is with string literals that contain characters outside the basic ascii-range. When they are passed to string routines, the non-ascii-characters are replaced with question marks.

To avoid this, cast the text literals to WideStrings before passing them as a parameter to the function.

I do not know whether it applies to the StringReplace-routine, but Delphi's search routine Pos/Posex does not handle Unicode correctly. We had to replace these routines with our own variant. For this improved routine it is important to make sure that the parameters are of the WideString type, not the normal string type.

We did this in D7 when handling Unicode, and all works well.

1
votes

Although string is a Unicode type now, when you specify a length, you still get the non-Unicode ShortString type. The TestChar variable in your RemoveSpace function is a non-Unicode one-character string. What you should have been using all along is a real Char variable. I expect you came from the VB world, where one-character strings were the same as single characters. In Delphi, a string isn't the same as a character, so when you call Copy, you get a string.

In Unicode Delphi, that one-character string gets reduced to a non-Unicode string, and if there's no representation for that character in the current code page, you get a question mark instead. Fix it like this:

function RemoveSpace(const InStr: string): string;
var
  I: Integer;
  TestChar: Char;
begin
  Result := '';
  for I := 1 to Length(InStr) do
  begin
    TestChar := InStr[I];
    if TestChar <> ' ' then
      Result := Result + TestChar;
  end;
end;

I got rid of Ans. As of Turbo Pascal 7, you can use the implicitly declared Result variable instead of declaring your own and then assigning it to the function name. Result is readable and writable. Also, you don't need to worry about zero-length input. When the upper bound of a "for-to" loop is less than the lower bound, the loop simply doesn't run, so you don't need to check beforehand. Finally, I used the bracket operators on InStr to extract the character at the given index instead of getting a one-character-long string.

You say that your uses of Length and Copy are obviously incorrect, but you're wrong. Those functions continue to work just fine in Unicode. They know that Char is two bytes wide now, so if you call them on UnicodeString variables, you'll get the right characters. They also continue to work on AnsiString variables. In fact, they also work find on WideString variables, even in older Delphi versions.

The primary problem in your code was where you stored a Unicode character into a non-Unicode string type.

0
votes

Guessing from your problem description, you seem to process UTF8-encoded strings. That's almost always a bad idea. Decode them into a saner representation first, and then operate on them. When you're done, you can encode everything as UTF-8 again.

I think the datatype for wide-character strings is "WString" in Delphi; can't look it up right now.

0
votes

String[1] do not have unicode version

try Char instead.