109
votes

I have googled on this topic and I have looked at every answer, but I still don't get it.

Basically I need to convert UTF-8 string to ISO-8859-1 and I do it using following code:

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
string msg = iso.GetString(utf8.GetBytes(Message));

My source string is

Message = "ÄäÖöÕõÜü"

But unfortunately my result string becomes

msg = "�ä�ö�õ�ü

What I'm doing wrong here?

9
All strings in .NET internally store the strings using unicode characters. There is no notion of a String being "windows-1252", "iso-8859-1", "utf-8", etc. Are you trying to throw away any characters in your string that do not have a representation in the Windows-1252 code page?Ian Boyd
@IanBoyd Actually, a String is a counted sequence of UTF-16 code units. (Unfortunately, the term Unicode has been misapplied in Encoding.Unicode and in the Win32 API. Unicode is a character set, not an encoding. UTF-16 is one of several encodings for Unicode.)Tom Blodget
You make incorrect action: you make byte array in utf8 encoding, but read them by iso decode. If you want make string with encoded symbols it simple call string msg = iso.GetString(iso.GetBytes(Message));StuS
That's called Mojibake.Rick James
I guess what Daniil is saying is that Message was decoded from UTF-8. Assuming that part worked correctly, converting to Latin-1 is as simple as byte[] bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(Message). Then, like StuS says, you can convert the Latin-1 bytes back to UTF-16 with Encoding.GetEncoding("ISO-8859-1").GetString(bytes)Qwertie

9 Answers

186
votes

Use Encoding.Convert to adjust the byte array before attempting to decode it into your destination encoding.

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(Message);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string msg = iso.GetString(isoBytes);
27
votes

I think your problem is that you assume that the bytes that represent the utf8 string will result in the same string when interpreted as something else (iso-8859-1). And that is simply just not the case. I recommend that you read this excellent article by Joel spolsky.

16
votes

Try this:

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(Message);
byte[] isoBytes = Encoding.Convert(utf8,iso,utfBytes);
string msg = iso.GetString(isoBytes);
8
votes

You need to fix the source of the string in the first place.

A string in .NET is actually just an array of 16-bit unicode code-points, characters, so a string isn't in any particular encoding.

It's when you take that string and convert it to a set of bytes that encoding comes into play.

In any case, the way you did it, encoded a string to a byte array with one character set, and then decoding it with another, will not work, as you see.

Can you tell us more about where that original string comes from, and why you think it has been encoded wrong?

7
votes

Seems bit strange code. To get string from Utf8 byte stream all you need to do is:

string str = Encoding.UTF8.GetString(utf8ByteArray);

If you need to save iso-8859-1 byte stream to somewhere then just use: additional line of code for previous:

byte[] iso88591data = Encoding.GetEncoding("ISO-8859-1").GetBytes(str);
0
votes

Just used the Nathan's solution and it works fine. I needed to convert ISO-8859-1 to Unicode:

string isocontent = Encoding.GetEncoding("ISO-8859-1").GetString(fileContent, 0, fileContent.Length);
byte[] isobytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(isocontent);
byte[] ubytes = Encoding.Convert(Encoding.GetEncoding("ISO-8859-1"), Encoding.Unicode, isobytes);
return Encoding.Unicode.GetString(ubytes, 0, ubytes.Length);
0
votes
Encoding targetEncoding = Encoding.GetEncoding(1252);
// Encode a string into an array of bytes.
Byte[] encodedBytes = targetEncoding.GetBytes(utfString);
// Show the encoded byte values.
Console.WriteLine("Encoded bytes: " + BitConverter.ToString(encodedBytes));
// Decode the byte array back to a string.
String decodedString = Encoding.Default.GetString(encodedBytes);
0
votes

Maybe it can help
Convert one codepage to another:

    public static string fnStringConverterCodepage(string sText, string sCodepageIn = "ISO-8859-8", string sCodepageOut="ISO-8859-8")
    {
        string sResultado = string.Empty;
        try
        {
            byte[] tempBytes;
            tempBytes = System.Text.Encoding.GetEncoding(sCodepageIn).GetBytes(sText);
            sResultado = System.Text.Encoding.GetEncoding(sCodepageOut).GetString(tempBytes);
        }
        catch (Exception)
        {
            sResultado = "";
        }
        return sResultado;
    }

Usage:

string sMsg = "ERRO: Não foi possivel acessar o servico de Autenticação";
var sOut = fnStringConverterCodepage(sMsg ,"ISO-8859-1","UTF-8"));

Output:

"Não foi possivel acessar o servico de Autenticação"
-5
votes

Here is a sample for ISO-8859-9;

protected void btnKaydet_Click(object sender, EventArgs e)
{
    Response.Clear();
    Response.Buffer = true;
    Response.ContentType = "application/vnd.openxmlformatsofficedocument.wordprocessingml.documet";
    Response.AddHeader("Content-Disposition", "attachment; filename=XXXX.doc");
    Response.ContentEncoding = Encoding.GetEncoding("ISO-8859-9");
    Response.Charset = "ISO-8859-9";
    EnableViewState = false;


    StringWriter writer = new StringWriter();
    HtmlTextWriter html = new HtmlTextWriter(writer);
    form1.RenderControl(html);


    byte[] bytesInStream = Encoding.GetEncoding("iso-8859-9").GetBytes(writer.ToString());
    MemoryStream memoryStream = new MemoryStream(bytesInStream);


    string msgBody = "";
    string Email = "[email protected]";
    SmtpClient client = new SmtpClient("mail.xxxxx.org");
    MailMessage message = new MailMessage(Email, "[email protected]", "ONLINE APP FORM WITH WORD DOC", msgBody);
    Attachment att = new Attachment(memoryStream, "XXXX.doc", "application/vnd.openxmlformatsofficedocument.wordprocessingml.documet");
    message.Attachments.Add(att);
    message.BodyEncoding = System.Text.Encoding.UTF8;
    message.IsBodyHtml = true;
    client.Send(message);}