0
votes

When I tried to convert the xml file with a UTF-16 encoding to ISO-8859-1, I am seeing broken characters like Â.

Can you please suggest some solution to remove the broken characters? I want the XML in an ISO encoded format.

Here is my code,

using (SqlConnection sqlConnection = new SqlConnection(ConfigurationManager.AppSettings.Get("SqlConn")))
{
    sqlConnection.Open();

    using (SqlCommand sqlCommand = new SqlCommand())
    {
        sqlCommand.CommandTimeout = 0;
        sqlCommand.CommandText = commandText;
        sqlCommand.Connection = sqlConnection;

        // the data from database data is UTF encoded
        using (StreamWriter textwriterISO = new StreamWriter(path + "_out.XML", false, Encoding.GetEncoding("ISO-8859-1")))
        {                                  
            SqlDataReader sqlDataReader = sqlCommand.ExecuteReader();
            Console.WriteLine("Writing results.This could take a very long time.");
            while (sqlDataReader.Read())
            {
                for (int i = 0; i < sqlDataReader.FieldCount; i++)
                {
                    byte[] arr = System.Text.Encoding.GetEncoding(28591).GetBytes(sqlDataReader[i].ToString());
                    string ascii = Encoding.GetEncoding("UTF-8").GetString(arr);
                    textwriter.WriteLine(sqlDataReader.GetName(i),ascii));
                }

                textwriter.Flush();
            }
        }
    }                         
}
1
Could we trouble you for some of the code you're using? We don't even know what programming language you're using. - JLRishe
@JLRishe, I am using C# language for conversion - yamuna
And would you mind showing us your code, or do you want us to do all the work for you? - JLRishe

1 Answers

2
votes

Your code is misusing the StreamWriter class and doing the wrong kind of manual encoding of the DB data. You are converting the source UTF-16 DB data to CP28591, interpretting the CP28591 bytes as UTF-8 in order to convert them back to UTF-16, then having StreamWriter convert the now-malformed UTF-16 to ISO-8859-1 when writing to the file. That is the completely wrong thing to do, let alone the wasted overhead of all those conversions. Let StreamWriter handle the encoding of the source UTF-16 DB data directly, get rid of everything else, eg:

using (StreamWriter textwriterISO = new StreamWriter(path + "_out.XML", false, Encoding.GetEncoding("ISO-8859-1")))
{                                  
    SqlDataReader sqlDataReader = sqlCommand.ExecuteReader();
    Console.WriteLine("Writing results.This could take a very long time.");
    while (sqlDataReader.Read())
    {
        for (int i = 0; i < sqlDataReader.FieldCount; i++)
        {
            // you were originally calling the WriteLine(String, Object) overload.
            // Are you sure you want to call that? It interprets the first parameter
            // as a pattern to format the value of the second parameter. A DB column
            // name is not a formatting pattern!
            textwriterISO.WriteLine(sqlDataReader.GetName(i), sqlDataReader[i].ToString());

            // perhaps you meant to write the DB column name and field value separately?
            //
            // textwriterISO.WriteLine(sqlDataReader.GetName(i));
            // textwriterISO.WriteLine(sqlDataReader[i].ToString());
        }
        textwriterISO.Flush();
    }
}

With that said, you mention that you want the output in XML format. StreamWriter by itself is not going to output XML for you. Use the XmlSerializer or XmlTextWriter class instead to convert your DataReader data into XML that is then written to your StreamWriter.