1
votes

I am using an textreader from the internet to read Chinese text, but I'm receiving incorrect letters.

For example, I get back 您好 ï¼ instead of 轉注字. Also, if I parse German strings, I receive Sie können instead of Sie können.

This is the original string from website:

<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">Sie können einige Blumen auswählen</string>

It is UTF-8 encoded. How do I resolve this encoding issue?

Regards

1
This looks like some UTF-8 encoded byte stream was decoded using a 8-bit encoding (probably one of the ISO-8859-* family or of its Windows-* counterparts). I don't know enough about .NET to tell you the correct way, however. - Joachim Sauer
damn stackoverflow restriction that I am not allowed to answer with fully comments!!! - goldengel
Thanks for ideas. The problem was the Webbrowser uses > iso-8859-1 to decode stuff as standard. I needed to use UTF-8 instead but did not exactly know. I thought it was reversed. The solution is, to just set the browsers encoding before downloading the string (no binary download needed). - goldengel
damn stackoverlow restriction that I can not format because I need to wait for 8 hours to answer the question. How should others know that I already found a good answer? - goldengel
... ' Dim U As Uri = CreateUri(item.German) Me.Web.Encoding = System.Text.Encoding.UTF8 ' System.Text.Encoding.GetEncoding(1252) 'System.Text.Encoding.ASCII ' System.Text.Encoding.GetEncoding("ISO-8859-1") Me.Web.DownloadStringAsync(U) ' Result is: > Ich pflücke glücklich blümchen which is correct! - goldengel

1 Answers

2
votes

Try to initialize your TextReader with the appropriate encoding:

using(var reader = new StreamReader(stream, Encoding.UTF8)
{
    // read the text
}