0
votes

After hours of searching, I can't find a solution for saving a file in a forced UTF-8 encoding. If there is any character in a string which is only available in UTF-8, the file is successfully saved as a UTF-8, but if there are characters which are available in ASCII and UTF-8, the file is saved as ASCII

file_put_contents("test1.xml", "test"); // Saved as ASCII
file_put_contents("test2.xml", "test&"); // Saved as ASCII
file_put_contents("test3.xml", "tëst&"); // Saved as UTF-8

I can add a BOM to force a UTF-8 file, but the receiver of the document does not accept a BOM:

 file_put_contents("utf8-force.xml", "\xEF\xBB\xBFtest&"); // Stored as UTF-8 because of the BOM

I did check the encoding with a simple code:

exec('file -I '.$file, $output);
print_r($output);

Since the character & is a single byte in ASCII and a two-byte character is UTF-8, the receiver of the file can't read the file. Is there a solution to force a file to UTF-8 without a BOM in PHP?

1
That's not how UTF8 works, and an "ASCII" file is byte-for-byte identical to a UTF8 file if you're only using codepoints under 127. UTF8 files categorically do not need BOMs, and your receiver is the problem in this situation.Sammitch
The receiver was indeed the problem since they would like to have always a UTF-8 file. So, the solution was in this particular case that I've added a character which doesn't exist in ASCII (ë, é etc.) to an attribute of the XML.Stefan

1 Answers

-2
votes

file_put_contents will not convert encoding You have to convert the string explicitly with mb_convert_encoding

try this :

$data = 'test';
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
file_put_contents("test1.xml", $data); 

or you can try using stream_filer

$data = 'test';
$file = fopen('test.xml', 'r');
stream_filter_append($file, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($file, fopen($data, 'w'));