This is a strange scenario, not conventional converting one encoding to another one.
Question
I use Readability API to retrieve main content from given url, it works fine if the target url is encoded with UTF-8, but when target url is encoded in GB2312
(one of Chinese encoding), I get rubbish information instead(the Chinese characters are wrongly encoded but English letters and digits work fine).
Deep Research
I inspected the HTTP header Readability API returns, it indicates that the encoding of API response is UTF-8
.
Here's a snippet of wrongly encoded Chinese characters:
ÄÉ´ï¶û¾ø¾³Ï´󷴻÷¾Ü¾øÀäÃÅÄæת½ú¼¶ÖÐÍøËÄÇ¿
Length: 42
Which originally are:
纳达尔绝境下大反击拒绝冷门逆转晋级中网四强
Length: 21
However, if you convert the correct Chinese into unicode, it should be:
纳达尔绝境下大反击拒绝冷门逆转晋级中网四强
Tried But Not Working
iconv("GB2312", "UTF-8", $str);
iconv("GBK", "UTF-8", $str);
iconv("GB18300", "UTF-8", $str);
mb_convert_enconding($str, "UTF-8", "GB2312");
mb_convert_enconding($str, "UTF-8", "GB18300");
mb_convert_enconding($str, "UTF-8", "GBK");
Solution Requested
Since Readability API doesn't provide a parameter for charset of target url( I call this API like https://www.readability.com/api/content/v1/parser?url=http://sports.sina.com.cn/t/2013-10-04/14596813815.shtml&token=my_token_here), I have to do the convertion when handling the API response.
I will appreciate it very much if you have any idea about this issue.
Environment Info: PHP 5.3.6