0
votes

I have a very simple Perl script which works right on the terminal but when run as a CGI script it produces garbage. The script basically take a HTML entities encoded data and converts it to print it. I have tried all the different setup like using "Encode" to change the output and set the STDOUT to utf8 mode and it does not help. I have also tried to change the environment of CGI to see if things will work like the terminal environment. Still no luck.

Here is the script

#!/usr/bin/perl 
use HTML::Entities qw(encode_entities_numeric decode_entities);
use Encode qw/encode decode/;
binmode(STDOUT, ":utf8");
#$ENV{'PERL_UNICODE'} = 'D';
#$ENV{'LANG'} = 'en_US.UTF-8';
#$ENV{'TERM'} = 'vt100';
#$ENV{'SHELL'} = '/bin/bash';
#binmode(STDOUT, ":utf8");
print "Content-type: text/html\n\n";
my $y = decode_entities("Συστήματα_&#x
391;νίχνευσης_Εισ.pd
f");
#print encode("UTF8",$y);
print $y;

The output on terminal it is clean like perl test.pl Content-type: text/html

Συστήματα_Ανίχνευσης_Εισ.pdf

But on the CGI print it is garbled ΣυστηÌματα_ΑνιÌχνευσης_Εισ.pdf

I am sort of stuck as I cannot find any simple way to solve this. Tried "encode_utf8" and utf8::upgrade of the variable but still no luck. Anyone's experience here will help a lot!

Thanks Vijay

1

1 Answers

6
votes

When interpreting a HTML document, the browser needs to know the encoding. The default encoding as per the HTML standard is not UTF-8. Since the browser is assuming the wrong encoding, it reads garbage.

Instead, you should specify the encoding explicitly, such as by printing a meta tag

<meta charset="utf-8">

or by including the encoding in the content type:

Content-type: text/html; charset=utf-8

Here, using the content type would seem most appropriate.