My Perl script and data input file are in BIG5 Chinese encoding.
The string data contains HTML entity, eg. Japanese characters
The result displays perfectly when viewing on the browser.
But for further data manipulation, I need to convert them all into UTF-8
eg.
From BIG5 encoding
一と三
To UTF-8 encoding
一と三
Here's the code I've tried:
#!/usr/local/bin/perl
use Encode qw/encode decode/;
use HTML::Entities;
print "Content-type: text/html\n\n";
$str = "と";
$str = encode('utf8', decode("big5",$str));
print "$str\n";
decode_entities($str);
print "$str\n";
$str2 = "一と三";
$str2 = encode('utf8', decode("big5",$str2));
print "$str2\n";
decode_entities($str2); # where the issue is
print "$str2\n";
Here's the result after running the above code.
と
と
一と三
ä¸とä¸
Please note the script itself is also saved as BIG5 encoding.
After decode_entities($str2);
it seems that it's trying to decode the Chinese characters in UTF-8 too, that's causing the issue.
How do I fix this issue? Or limit to the decode_entities() only applying to &xxxxx;
pattern?