2
votes

In the case where a Unicode character or a UTF8 character exists in a ansistring is it possible to strip the characters from the string? In this particular case the ansistring contains EXIF parameters.

Edit

When the string is read it is visible as: Copyright © 2013 The States of Guernsey (Guernsey Museums & Galleries)

In one case, the copyright symbol © is encoded as UTF-8 sequence (that is 0xc2 and 0xa9). Delphi 7 and Delphi 2010 shows it as ascii, displaying an "Â" (C2) and a "©" (A9), ignoring that is a UTF8 sequence. Exif tags and the Copyright tag (33432) should be simple ASCII, not UTF8 or unicode.

So if a ansistring contains one or more of these characters can they be stripped from the string or do they have to be manually edited?

Edit2

Attempting to recover the UTF8 I tried:

// remove the null terminator from a string (part of imageen unit} function RemoveNull(sValue: string): string; begin result := trim(svalue); if (result <> '') and (result[length(result)] = #0) then SetLength(result, length(result) - 1); result := trim(result); end;

EXIF_Copyright: is defined by ImageEn as AnsiString; utf8: UTF8String;

// EXIF_Copyright
// Shows copyright information
SetLength(utf8, Length(EXIF_Copyright)); // [DCC Error] iexEXIFRoutines.pas(911): E2026 Constant expression expected
Move(Pointer(EXIF_Copyright)^, Pointer(utf8)^, Length(EXIF_Copyright)));
_EXIF_Copyright: result := RemoveNull(EXIF_Copyright);

Unfortunately I have little experience dealing with UTF8.

where EXIF_Copyright is an ansistring;

but this will not compile...

1
If its ansistring cannot you access all the characters?Rob

1 Answers

2
votes

The simplest approach is to read your UTF-8 string into a variable of type UTF8String and then assign to another string variable.

You can assign to an AnsiString if you want, but I don't understand why you would do that. If you do convert to ANSI, any characters that cannot be represented will be converted to question marks. If you are desperate to strip non-ASCII characters, read into UTF8String, convert to string, and strip characters > 127.

As I understand it, the standard mandates ASCII but it's common now for EXIF text to be encoded with UTF-8.

I suggest you simply read the text into a UTF8String and leave it at that.

Your library gives you an AnsiString that actually contains UTF-8 text. So you can simply convert to UTF8String like this:

function ReinterpUTF8storedInAnsiString(const ansi: AnsiString): string;
var
  utf8: UTF8String;
begin
  SetLength(utf8, Length(ansi));
  Move(Pointer(ansi)^, Pointer(utf8)^, Length(ansi));
  Result := utf8;
end;

Now you will have the text that the file creator intended you to see.