1
votes

There seems to be an encoding issue or bug in PHP with fputcsv() and fgetcsv().

The following PHP code:

$row_before = ['A', json_encode(['a', '\\', 'b']), 'B'];

print "\nBEFORE:\n";
var_export($row_before);
print "\n";

$fh = fopen($file = 'php://temp', 'rb+');

fputcsv($fh, $row_before);

rewind($fh);

$row_after = fgetcsv($fh);

print "\nAFTER:\n";
var_export($row_after);
print "\n\n";

fclose($fh);

Gives me this output:

BEFORE:
array (
  0 => 'A',
  1 => '["a","\\\\","b"]',
  2 => 'B',
)

AFTER:
array (
  0 => 'A',
  1 => '["a","\\\\',
  2 => 'b""]"',
  3 => 'B',
)

So clearly, the data is damaged on the way. Originally there were just 3 cells in the row, afterwards there are 4 cells in the row. The middle cell is split thanks to the backslash that is also used as an escape character.

See also https://3v4l.org/nc1oE Or here, with explicit values for delimiter, enclosure, escape_char: https://3v4l.org/Svt7m

Is there any way I can sanitize / escape my data before writing to CSV, to guarantee that the data read from the file will be exactly the same?

Is CSV a fully reversible format?

EDIT: The goal would be a mechanism to properly write and read ANY data as csv, so that after one round trip the data is still the same.

EDIT: I realize that I do not really understand the $escape_char parameter. See also fgetcsv/fputcsv $escape parameter fundamentally broken Maybe an answer to this would also bring us closer to a solution.

5
CSV has loads of limitations, from embedded quotes to multiline data. This is why things like XML, JSON and various other formats are used. - Nigel Ren
I have plenty of embedded quotes and line breaks in data that I export to CSV, and until now there was never a (visible) problem. - donquixote
CSV with line breaks in data isn't portable, if you just want data for your own application, why bother with CSV and just write any old format you want - and as your using all sorts of separators - you are effectively doing that. - Nigel Ren
1. Depends on the application. E.g. LibreOffice Calc has no problem with it. It might be nonstandard, but the world does not care. 2. Because CSV (or PHP-flavoured CSV) is already supported natively, 3. I am using the standard separators mostly. The only reason I use different separators is to test if PHP will behave differently. - donquixote
Also, the application I am developing has all sorts of CSV from external sources already, so I prefer to stick to this one format instead of inventing new formats. - donquixote

5 Answers

4
votes

The culprit is that fputcsv() uses an escape character, which is a non-standard extension to CSV. (Well, as far as RFC 7111 can be regarded as standard.) Basically, this escape character would have to be disabled, but passing an empty string as $escape to fputcsv() doesn't work. Usually, passing a NUL character should give the desired results, however, see https://3v4l.org/MlluN.

2
votes

UPDATE Jan 2020

Since PHP 7.4, passing an empty string as escape char fixes the problem! https://www.php.net/manual/en/function.fgetcsv.php

Demo https://3v4l.org/33Wja - see difference of PHP 7.4 vs older versions. (this is the same snippet as below, just with empty string as escape char)

Original answer

Contrary to what others are saying, I claim that this is a PHP bug. I am going to report it, and update this answer.

EDIT: Now reported here, https://bugs.php.net/bug.php?id=74713

Discussed in this answer:

  • Does changing the delimiter help? -> Not really.
  • Could fputcsv() be fixed? -> Yes.

Does changing the delimiter help?

It can be shown that this is reproducible with any combination of delimiter, enclosure and escape character.

https://3v4l.org/a29kR

$delimiter = 'X';
$enclosure = 'Y';
$escape_char = "Z";

$row_before = [
  'A',
  "[{$enclosure}a{$enclosure}{$delimiter}{$enclosure}{$escape_char}{$escape_char}{$enclosure}{$delimiter}{$enclosure}b{$enclosure}]",
  'B',
];

print "\nBEFORE:\n";
var_export($row_before);
print "\n";

$fh = fopen($file = 'php://temp', 'rb+');

fputcsv($fh,$row_before,$delimiter,$enclosure, $escape_char);

rewind($fh);

$row_plain = fread($fh, 1000);

print "\nPLAIN:\n";
var_export($row_plain);
print "\n";

rewind($fh);

$row_after = fgetcsv($fh, 500,$delimiter,$enclosure, $escape_char);

print "\nAFTER:\n";
var_export($row_after);
print "\n\n";

fclose($fh);

Output:

BEFORE:
array (
  0 => 'A',
  1 => '[YaYXYZZYXYbY]',
  2 => 'B',
)

PLAIN:
'AXY[YYaYYXYYZZYXYYbYY]YXB
'

AFTER:
array (
  0 => 'A',
  1 => '[YaYXYZZ',
  2 => 'bYY]Y',
  3 => 'B',
)

Could fputcsv() be fixed?

For this let's turn back to more common and readable delimiter, enclosure and escape character.

$delimiter = ',';
$enclosure = '"';
$escape_char = "@";

Here the result is:

BEFORE:
array (
  0 => 'A',
  1 => '["a","@@","b"]',
  2 => 'B',
)

PLAIN:
'A,"[""a"",""@@",""b""]",B
'

AFTER:
array (
  0 => 'A',
  1 => '["a","@@',
  2 => 'b""]"',
  3 => 'B',
)

We see that the '"@@"' part is exported as '""@@"', while it SHOULD have been exported as '""@@""'.

In fact, doing this manually with fwrite() instead of fputcsv() does fix the problem: https://3v4l.org/4U1CQ

0
votes

Using your code with specific delimiters but changing the following line will work...

$enclosure = "'";

I think it may be to do with thinking that the \ is escaping the following quote.

0
votes

As in php, \\ used to escape the backslash(link for PHP manual escape sequence),so for making it as string u need to use one more single quote(' ').

so your input array should be...

$row_before = ['A', json_encode(['a', "'\\'", 'b']), 'B'];
0
votes

This is not PHP bug. It seems that json_encode() use the same delimiter (,), enclosure (") and escape (\) which is the same as default delimiter, enclosure and escape for both fputcsv() and fgetcsv(). You may differentiate enclosure or escape, and delimiter if necessary.

As already answered, in this case it will work by specify enclosure with (') instead:

$row_before = ['A', json_encode(['a', '\\', 'b']), 'B'];

print "\nBEFORE:\n";
var_export($row_before);
print "\n";

$fh = fopen($file = 'php://temp', 'rb+');

fputcsv($fh, $row_before, ',', "'");

rewind($fh);

$row_after = fgetcsv($fh, 0, ',', "'");

print "\nAFTER:\n";
var_export($row_after);
print "\n\n";

fclose($fh);