3
votes

I have been stuck for days on exporting UTF-8 CSV with chinese characters that shows garbled text on Windows Excel. I am using PHP and have already added the BOM byte mark and tried encoding but no luck at all.

They open fine on Notepad++, Google Spreadsheet and even on Mac Numbers. But not on Excel which is a requirement by the client. When opening with Notepad++ the encoding is shown as UTF-8. If I change it to UTF-8 manually and save, the file opens fine on Excel.

It seems as though the BOM byte mark doesn't get saved in the output as Notepad++ always detect it as UTF-8 without BOM.

Also, the CSV is not saved on server. Data is retrieved from DB and then exported directly out.

Here are my codes:

// Setup headers
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Content-Description: File Transfer');
header("Content-type: text/csv");
header("Content-disposition: filename=".$filename.".csv");
header("Pragma: no-cache");

// First Method
$fp = fopen('php://output', 'w');
// Add BOM to fix UTF-8 in Excel, but doesn't work
fputs($fp, chr(0xEF) . chr(0xBB) . chr(0xBF) );

if ($fp) {

    fputcsv($fp, array("Header"), ",");
    fputcsv($fp, array($string_with_chinese_chars), ",");
}

fclose($fp);
exit();

// Second Method
$csv = "";
$sep = ",";
$newline = "\n"; // Also tried with PHP_EOL

$csv .= "Header";
$csv .= $newline;
$csv .= $string_with_chinese_chars;
$csv .= $newline;

// Tried all the below ways but doesn't work.
// Method 2.1
print chr(255) . chr(254) . mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');

// Method 2.2
print chr(239) . chr(187) . chr(191) . $csv;

// Method 2.3
print chr(0xEF).chr(0xBB).chr(0xBF);
print $newline;
print $csv;
4
Can you open the file in a hex editor before and after saving it in Notepad++, and see what is the difference? And maybe even add the hex dump of the file into your question, if it's short enough?Ilmari Karonen
Ok, I will give it a try and update again.darnpunk
Update: The downloaded file HEX starts with 0A EF BB BF. While the file after saving with Notepad++ starts with EF BB BF 0A EF BB BF. 0A looks to be a new line. Somehow that seems to be added at the start of the file even though there wasn't any part of the codes doing that. This is a shared hosting server and I don't have access to the php.ini.darnpunk

4 Answers

3
votes

Based on your comment above, it looks like your script is accidentally printing out a newline (hex 0A) before the UTF-8 BOM, causing Excel not to recognize the output as UTF-8.

Since you're using PHP, make sure that there's no empty line before the <?php marker in your script, or in any other PHP file that it might include. Also make sure that none of the files you include has any whitespace after the closing ?> marker, if there is one.

In practice, this can be quite hard to do, since many text editors insist on always appending a newline to the end of the last line. Thus, the safest and easiest solution is to simply leave out the ?> marker from your PHP files, unless you intend to print out whatever comes after it. PHP does not require the ?> to be present, and using it in files that are not meant to be mixed PHP and literal template HTML (or other text) is just asking for bugs like this.

3
votes

Below code worked for me. Output utf-8-bom characters before csv content:

  echo "\xEF\xBB\xBF"; // utf-8 bom 
  echo $csv;
2
votes

I usually do it this way:

header('Content-Type: application/csv');
header('Content-Disposition: attachment; filename="filename.csv"');
header('Cache-Control: max-age=0');

// BOM header UTF-8
echo "\xEF\xBB\xBF";

$fh = @fopen('php://output', 'w');

...

And I use the ";" as seperator as excel most likley doesnt autoformat the ","

0
votes

Hope this can help someone. What worked for me was I had to put both:

...
echo chr(0xEF) . chr(0xBB) . chr(0xBF);
$file = fopen('php://output', 'w');
fputs($file, chr(0xEF) . chr(0xBB) . chr(0xBF));
...

I'm not an expert in PHP so I can't explain why this works by I hope this helps someone because I had a hard time also solving this problem.