1
votes

I have implemented a PHP script.
I run my PHP script via the following URL : http://server/script.php?param1=%80t%80

So I pass a GET parameter to my PHP script.
The parameter is named param1.
param1 contains the string "€t€" which is URL-encoded as "%80t%80".

My PHP script is encoded with the UTF-8 norm.
I was wondering which character encoding applies on the string contained in $_GET["param1"].

For sure the character encoding on $_GET["param1"] is not UTF-8.
The reason : The following command in my PHP script results to "80 74 80" which is the hexadecimal representation of $_GET["param1"].

var_dump(unpack("H*", $_GET["param1"]));

If the character encoding on $_GET["param1"] was UTF-8 then the previous PHP command would result to "e2 82 ac 74 e2 82 ac".

The character encoding on $_GET["param1"] is not ISO-8859-1 neither because the € symbol is not included in the IS0-8859-1 charset.
To view the ISO-8859-1 encoding table go to http://en.wikipedia.org/wiki/ISO/IEC_8859-1
So the PHP internal encoding returned by the mb_internal_encoding function does not apply on $_GET["param1"] because it is IS0-8859-1.

Does anyone know which character encoding applies on the string contained in $_GET["param1"] ?

2

2 Answers

0
votes

I am not sure I understand why you are using unpack while trying to deal with a character-encoding problem you are trying to solve. So here it goes...

I suppose you are trying to read the value of $_GET['param1'] with something like:

$var = $_GET['param1']; I suggest you try urldecode $var = urldecode($_GET['param1']) and then use functions for handling multiByte strings http://gr.php.net/manual/en/ref.mbstring.php or use the iconv functions.

Hope the above helps.

0
votes

For sure the character encoding on $_GET["param1"] is not UTF-8. The reason : The following command in my PHP script results to "80 74 80" which is the hexadecimal representation of $_GET["param1"].

This is exactly what you'd expect, because it's what you've written. The parameter %80t%80 means three characters: hex 80, "t", hex 80. %80 means "hex 80". You're manually specify a specific hex value, character encoding doesn't come in to this at all.

Try this:

var_dump( unpack ("H*", urldecode("%80t%80")));

And this:

http://server/script.php?param1=%e2%82%ac%74%e2%82%ac