2
votes

I experience a strange behavior in Perl while trying to decode a Unicode JSON string coming from a PHP script's json_encode function. I simplified the problem to next code:

#!/usr/bin/perl
use CGI;
use JSON;
print CGI::header(-type=>'text/html', -charset=>'UTF-8');

print %{ decode_json('{"test_1" : "= \u00F9 ="}') }->{'test_1'};
print '<br>';
print %{ decode_json('{"test_2" : "= \u00F9 \u0121 ="}') }->{'test_2'};

When I run this script in browser I see next:

= � =
= ù ġ =

The first line contains a "broken character", the second is correct. What I think is happenning is that for some reason Perl decodes first string in ISO-8859-1 encoding, if I change page encoding to ISO-8859-1 the first line is correct, however the second is broken.

My Perl version is 5.10.1 and the JSON version is 2.51.

Question: how to force Perl json_decode to return UTF-8 characters in the first print?

Note: I can fix the problem by manually converting first output to UTF-8, but this requires the installation of an additional "Encoder" module, which I want to avoid.

1
The Encode module comes with Perl since v5.7.3.daxim

1 Answers

4
votes

Tried your code and it generated several warnings with "use warnings;"

If you want to be sure to get utf8 I believe you have to tell Perl so. Use "binmode(STDOUT, ":utf8");" or similar.

This works on the command-line:

use strict;
use warnings;
use JSON;

binmode(STDOUT, ":utf8");

print decode_json('{"test_1" : "= \u00F9 ="}')->{test_1};
print '<br>';
print decode_json('{"test_2" : "= \u00F9 \u0121 ="}')->{'test_2'};

EDIT: AFAIK, this does not affect decode_json(), but the output from the perl script itself. Unicode tutorials often tell you to explicitly state what encoding you want on your input & output (filehandlers)