6
votes

Two starting points:

Is it correct to use

use uni::perl; # or any similar

in the PSGI application and/or in my modules?

uni::perl changes Perl's default IO to UTF-8, thus:

use open qw(:std :utf8);
binmode(STDIN,   ":utf8");
binmode(STDOUT,  ":utf8");
binmode(STDERR,  ":utf8");

Will doing so break something in Plack or its middlewares? Or is the only correct way to write apps for Plack explicitely encoding/decoding at open, so without the open pragma?

2
Does Plack write to STDOUT or read from STDIN? If so, it's almost surely wrong (unless they're also a bug in Plack). I said "almost" because the use of binmode in Plack would make it not care. PS - now you know why it's not done by default; it breaks stuff.ikegami
I'm in hope than @miyagawa gurusan will tell more.. :) And I understand why utf8 is not default, but, (IMO) the new CPAN modules should be developed with "perl -CSDA" or with env PERL_UNICODE in mind. And miyagawa sure uses it in japan environment, so, should know the right way.. ;)cajwine
I think the “correct way” you list is broken. text/plain need a charset so the other side knows what the bytes represent and how to decode them.Ashley
@Ashley, yes - and thanx. The fragment has another errors too ($str vs $output). But it is not really related to the question.cajwine

2 Answers

2
votes

You really don't want to set STDIN/STDOUT to be UTF-8 mode by default on Plack, because you don't know for instance whether they will be binary data transports. E.g. if those filehandles are the FastCGI protocol connector they will be carrying encoded binary structures and not UTF-8 text. They therefore must not have an encoding layer defined, or those binary structures will be mangled or rejected as invalid.

-2
votes

On modern GNU/Linux systems you should completely switch to UTF-8 globally. This means setting

LANG="xx_YY.UTF-8"
PERL_UNICODE=SDAL
PERL5OPT=-Mutf8

in your /etc/environment or /etc/sysconfig/i18n or /etc/default/locale or whatever your system configuration file is. Because of RHEL/Centos bug I symlinked /etc/environment to sysconfig/i18n.

Scripts that rely on binary input should set binmode on STDIN/OUT/ERR(?) or use open pragma or should be called with -C0 option.

The problem is that some DBD drivers are buggy, e.g. DBD::JDBC, and you must set the utf8 flag by hand.

use Encode qw/_utf8_on/;
map { _utf8_on $_; } @strings;