0
votes

I have a problem with the POST query made by Bitbucket to a personal page whenever a commit is made: the accentuated characters are replaced by digits.

Here is the message I use for the commit: Démo test éà

And here is what my page gets: Du00e9mo test u00e9u00e0

  • I tried using uft8_decode, utf8_encode, iconv (with UTF-8 and ISO-8859-1) and others (based on posts I found)
  • I saved my script in UTF-8
  • I tried using header('Content-Type: text/html; charset=UTF-8');
1
Have you tried url encoding? - Maks3w

1 Answers

1
votes

This is what would happen if you removed the backslashes \ from a JSON encoded string. The UTF encoding is correct (è is correctly 00e9).

If you use stripslashes in your code, do not, or use it (but it shouldn't be necessary) with a mapping function after json_decode.

This is what a rogue stripslashes would do:

<?php print json_decode(stripslashes(json_encode("Démo test éà"))) . "\n"; ?>

Du00e9mo test u00e9u00e0

If you have no control on the interface, you can try running the process in reverse to get back the correct string. This is a bit of a monstrous hack and is not really very robust, so I'd use it only as a very last resort:

<?php

$string = "Du00e9mo test u00e9u00e0";

$correct = preg_replace("/u([0-9a-f][0-9a-f][0-9a-f][0-9a-f])/", '\\u\\1', json_encode($string));

$string = json_decode($correct);

print "Output: $string\n";

?>

Output: Démo test éà