3
votes

I'm writing my first Unicode application with Delphi XE2, and I've stumbled upon an issue with GET requests to a Unicode URL.

In short, it's a routine in a MP3 tagging application that takes a track title and an artist, and queries Last.FM for the corresponding album, track number, and genre.

I have the following code:

function GetMP3Info(artist, track: string) : TMP3Data //<---(This is a record)
var
  TrackTitle,
  ArtistTitle : WideString;
  webquery    : WideString;

[....]

WebQuery := UTF8Encode('http://ws.audioscrobbler.com/2.0/?method=track.getcorrection&api_key=' + apikey + '&artist=' + artist + '&track=' + track);

//[processing the result in the web query, getting the correction for the artist and title]

// eg: for artist := Bucovina and track := Mestecanis, the corrected values are 
//ArtistTitle := Bucovina;
// TrackTitle := Mestecăniș;

//Now here is the tricky part:

webquery := UTF8Encode('http://ws.audioscrobbler.com/2.0/?method=track.getInfo&api_key=' + apikey + '&artist=' + unescape(ArtistTitle) + '&track=' + unescape(TrackTitle)); 
//the unescape function replaces spaces (' ') with '+' to comply with the last.fm requests

[some more processing]

end;

The webquery in a TMemo looks just right:

http://ws.audioscrobbler.com/2.0/?method=track.getInfo&api_key=e5565002840xxxxxxxxxxxxxx23b98ad&artist=Bucovina&track=Mestecăniș

Yet, when I try to send a GET request to the webquery using TIdHTTP (with the ContentEncoding property set to 'UTF-8'), I see in Wireshark that TIdHTTP is GET'ing data using an ANSI request URL:

/2.0/?method=track.getInfo&api_key=e5565002840xxxxxxxxxxxxxx23b98ad&artist=Bucovina&track=Mestec?ni?

Here are the full headers for the GET requests and responses:

GET /2.0/?method=track.getInfo&api_key=e5565002840xxxxxxxxxxxxxx23b98ad&artist=Bucovina&track=Mestec?ni? HTTP/1.1
Content-Encoding: UTF-8
Host: ws.audioscrobbler.com
Accept: text/html, */*
Accept-Encoding: identity
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.23) Gecko/20110920 Firefox/3.6.23 SearchToolbar/1.22011-10-16 20:20:07

HTTP/1.0 400 Bad Request
Date: Tue, 09 Oct 2012 20:46:31 GMT
Server: Apache/2.2.22 (Unix)
X-Web-Node: www204
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST, GET, OPTIONS
Access-Control-Max-Age: 86400
Cache-Control: max-age=10
Expires: Tue, 09 Oct 2012 20:46:42 GMT
Content-Length: 114
Connection: close
Content-Type: text/xml; charset=utf-8;

<?xml version="1.0" encoding="utf-8"?>
<lfm status="failed">
<error code="6">
    Track not found
</error>
</lfm>

The question that puzzles me is, am I overseeing anything related to setting the properties of the TIdHTTP component? How can I stop the well-formatted URL I'm composing in the application from getting sent to the server in the wrong format?

2
@Tlama, your piece of advice worked. I owe you some points, but your previous comment appears to have been deleted. How can I accept your (now deleted) answer?Bogdan Botezatu
It was me, who deleted that comment, I've made an answer post from it.TLama

2 Answers

2
votes

To get the XML response from the track.getCorrection function you can use something like this:

uses
  IdHTTP, IdURI;

function GetMusicDataXML(const AArtist, ATrack: string): string;
var
  URL: string;
  IdHTTP: TIdHTTP;
const
  APIKey = '1a3d8080e427f4dxxxxxxxxxxxxxxxxx';
begin
  Result := '';
  IdHTTP := TIdHTTP.Create;
  try
    URL := TIdURI.URLEncode('http://ws.audioscrobbler.com/2.0/?method=track.getcorrection&api_key=' + APIKey + '&artist=' + AArtist + '&track=' + ATrack);
    Result := IdHTTP.Get(URL);
  finally
    IdHTTP.Free;
  end;
end;
2
votes
var
  ...
  webquery    : WideString; 
...
WebQuery := UTF8Encode('http://ws.audioscrobbler.com/2.0/?method=track.getcorrection&api_key=' + apikey + '&artist=' + artist + '&track=' + track); 

This does not do what you think it does. In XE2, UTF8Encode() returns a UTF-8 encoded RawByteString, which you are then assigning to a WideString. The RTL will decode the UTF-8 data back to a UTF-16 string. When you pass that string to TIdHTTP.Get(), it will convert it to ASCII when the actual HTTP request is formatted, losing any non-ASCII characters.

As @TLama said, you have to encode the URL using TIdURI before passing it to TIdHTTP. TIdURI will encode Unicode characters as UTF-8 (by default - you can specify the encoding if needed) and then encode the resulting data in an ASCII-compatible format that TIdHTTP will not lose.