fgetcsv encoding issue (PHP)

Question

I am being sent a csv file that is tab delimited. Here is a sample of what I see:

Invoice: Invoice Date   Account: Name   Bill To: First Name Bill To: Last Name  Bill To: Work Email Rate Plan Charge: Name  Subscription: Device Serial Number
2021-03-10  Test Company    Wally   Kolcz   [email protected]   Sample plan A0H1234567890A

I wrote a script to open, read and loop over the values but I get weird stuff after:

if (($handle = fopen($user_file, "r")) !== FALSE) {
            while (($data = fgetcsv($handle, 1000, "\t")) !== FALSE) {
                if($line >1 && isset($data[1])){
                    
                    $user = [
                        'EmailAddress' => $data[4],
                        'Name' => $data[2].' '.$data[3],
                    ];
                }

                $line++;
            }
            fclose($handle);
        }

Here is what I get when I dump the first line.

array:7 [▼
  0 => b"ÿþI\x00n\x00v\x00o\x00i\x00c\x00e\x00:\x00 \x00I\x00n\x00v\x00o\x00i\x00c\x00e\x00 \x00D\x00a\x00t\x00e\x00"
  1 => "\x00A\x00c\x00c\x00o\x00u\x00n\x00t\x00:\x00 \x00N\x00a\x00m\x00e\x00"
  2 => "\x00B\x00i\x00l\x00l\x00 \x00T\x00o\x00:\x00 \x00F\x00i\x00r\x00s\x00t\x00 \x00N\x00a\x00m\x00e\x00"
  3 => "\x00B\x00i\x00l\x00l\x00 \x00T\x00o\x00:\x00 \x00L\x00a\x00s\x00t\x00 \x00N\x00a\x00m\x00e\x00"
  4 => "\x00B\x00i\x00l\x00l\x00 \x00T\x00o\x00:\x00 \x00W\x00o\x00r\x00k\x00 \x00E\x00m\x00a\x00i\x00l\x00"
  5 => "\x00R\x00a\x00t\x00e\x00 \x00P\x00l\x00a\x00n\x00 \x00C\x00h\x00a\x00r\x00g\x00e\x00:\x00 \x00N\x00a\x00m\x00e\x00"
  6 => "\x00S\x00u\x00b\x00s\x00c\x00r\x00i\x00p\x00t\x00i\x00o\x00n\x00:\x00 \x00D\x00e\x00v\x00i\x00c\x00e\x00 \x00S\x00e\x00r\x00i\x00a\x00l\x00 \x00N\x00u\x00m\x00b\x00e\x00r\x00 ◀"
]

I tried adding:

header('Content-Type: text/html; charset=UTF-8');
$data = array_map("utf8_encode", $data);
setlocale(LC_ALL, 'en_US.UTF-8');

And when I dump mb_detect_encoding($data[2]), I get 'ASCII'...

Any way to fix this so I don't have to manually update the file each time I receive it? Thanks!

The problem is not simply "file has a BOM" please do not close it as such. — Sammitch

Andrea Andrea · Accepted Answer · 2021-03-15T20:34:40

Looks like the file is in UTF-16 (every other byte is null).

You probably need to convert the whole file with something like mb_convert_encoding($data, "UTF-8", "UTF-16");

But you can't really use fgetcsv() in that case…

fgetcsv encoding issue (PHP)

3 Answers