I am using XML::Code to create some XML Data from a GET parameter received through the CGI module. The webserver is Apache with charset set to UTF-8 and the submitting form is on a page with a
<!DOCTYPE html>
<html lang="en-GB">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
header. The CGI looks like this:
use CGI;
use Encode;
use XML::Code;
binmode(STDOUT, ":utf8");
binmode(STDIN, ":utf8");
my $cgi = CGI->new();
print $cgi->header(-type => "text/xml", -charset => "utf-8");
my $object = $cgi->param("object");
$object = decode("utf-8", utf8::upgrade($object));
my $content = XML::Code->new("formdata");
$content->version ("1.0");
$content->encoding ("UTF-8");
my $sub_content = XML::Code->new("object");
$sub_content->set_text($object);
$content->add_child($sub_content);
$sub_content = XML::Code->new("isutf");
$sub_content->set_text(utf8::is_utf8($object));
$content->add_child($sub_content);
print $content->code();
When calling the cgi with http://mydomain.com/cgi-bin/formdata.pl?object=ö the output (as copied from firebug) is
<?xml version="1.0" encoding="UTF-8"?>
<formdata>
<object>ö</object>
<isutf>1</isutf>
</formdata>
Removing binmode(STDOUT, ":utf8") from the CGI gives me what I am looking for
<?xml version="1.0" encoding="UTF-8"?>
<formdata>
<object>ö</object>
<isutf>1</isutf>
</formdata>
Now I know how to solve this issue, but I thought I would be safe when setting everything to UTF-8. If I am not it would mean a lot more testing. Is it a bug in the perl libraries or in my thinking?
Best, Marcus