3
votes

I found a lot of answers about this for Windows bot nothing very useful for Linux that I hadn't already checked.

My server is CentOS6, ext4 filesystem, PHP version is 5.4.39, MySQL 5.5.42.

All is set to use UTF8, from the LANG environment variable to the database, mysql client connection, php etc.

But with the following code I can't read files with special chars like èàòì etc.

The same code works on my Mac (php and mysql installed from ports).

As you can see in the code there are some commented tests. mb_detect_encoding($track,'auto') returns UTF-8.

    $db->bind("id",$this->request->get(1));
    $file = $db->row("select f.name from file f where f.id = :id and f.type = 'mp3';");
    $track = realpath(__DIR__ . '/../') . $file['name'];
    //$track = mb_convert_encoding(realpath(__DIR__ . '/../' . $file['name']), "UTF-8");
    //$track = iconv('utf-8', 'cp1252', realpath(__DIR__ . '/../' . $file['name']));
    //echo mb_detect_encoding($track,'auto');

    if (file_exists($track)) {
        header("Content-Transfer-Encoding: binary"); 
        header("Content-Type: audio/mpeg, audio/x-mpeg, audio/x-mpeg-3, audio/mpeg3");
        header('Content-length: ' . filesize($track));
        header('Cache-Control: no-cache');
        readfile($track);
    }

Any suggestions?

UPDATE

It looks like the problem is related to php and file related functions that don't seem to use UTF-8 with filenames.. for some reason.

I used a simple php script and run from shell and I have the same behavior even specifying a file name directly in the php script (so no db involved).

PHP settings:

    $ php -i | grep UTF
    default_charset => UTF-8 => UTF-8
    LANG => en_US.UTF-8
    LC_CTYPE => en_US.UTF-8
    _SERVER["LANG"] => en_US.UTF-8
    _SERVER["LC_CTYPE"] => en_US.UTF-8

    $ php --version
    PHP 5.4.39 (cli) (built: Mar 19 2015 06:25:23)


To be clear, something like this does not work:

    $track = "/path/to/existsing/file/with/spechialchars";
    echo "-> " .$track . "\n";
    if (file_exists($track)) {
        echo "OK " .$track . "\n";
    }

1
"LANG environment variable" -- ?? Irrelevant to MySQL. (I think)Rick James

1 Answers

0
votes

The import thing to remember is that in Linux, filenames don't have a character encoding and instead are just an 8bit strings.

For example, if you upload a file via FTP and the FTP server uses Windows-1252 character encoding, the filename will be 8bit Windows-1252. Trying to open the file using a UTF-8 characters will fail, no matter what the locale or LANG is.

This is unlike OS X, where the filename is always UTF-8, and Windows where the filename is always UTF-16.

As you'll probably found strings in PHP are also just 8bit strings, so it's impossible to know for sure what encoding is being used for a string - You can easily have two strings that are encoded to different character sets.

My advice is to ensure that you know the encoding for any string your read or output including form fields and filenames.

Therefore, make sure the filename on disk is UTF-8 and the filename value you put into the database is UTF-8. Then, when you pull the value from the DB, the file variable should be UTF-8 encoded already and will be ready to pass to the fopen command.