Why is Encode::decode('UTF-8', $var) still needed when everything is already in UTF-8?

Question

In a Webapp I maintain I try to keep everything in UTF-8:

the Database (CHARSET=utf8)
the source files (use utf8; written in utf8)
the templates (for Template Toolkit, using ENCODING => utf8)
user input and output (charset=utf8 header in HTTP, binmode :utf8 for STDIN and STDOUT)

But I still need to use Encode::decode('UTF-8',$data) for data coming from the database, or they will get double encoded or somehow broken.

Why is this? How can I get rid of this annoying extra step? Shouldn't there a way to just keep everything, everytime in UTF-8 without having to convert anything by hand?

w.k w.k · Accepted Answer · 2011-06-26T22:34:55

To get utf-8 properly from database you need on connection explicitly tell it:

my $dbh = DBI->connect( "dbi:mysql:dbname=$db;host=localhost",
       "user", "pwd", {mysql_enable_utf8 => 1 })

As i asked in my question here, there are still some problems with it, but in most cases it works fine.

To answer "why"-part is much harder. As Denis pointed, there was pretty heavy thread about "why" recently. Maybe it helps you understand related things. I suggest to use utf8::all` module to get utf-8 handling much easier and cleaner.

Why is Encode::decode('UTF-8', $var) still needed when everything is already in UTF-8?

3 Answers