0
votes

I need to read an XML file from the web with the encoding ISO-8859-1. After creating a XmlDocument with it I have tried to convert some InnerText of it to UTF. But that didn't work. Then I have tried to change the encoding on the HttpClient. The response string is properly formatted but when creating the XmlDocument, the app crashes with exception: HRESULT: 0xC00CE55F or with non expected characters on the XML string. How can I solve this issue?

Code Snippet:

private static async Task<string> GetResultsAsync(string uri)
        {
            var client = new HttpClient();
            var response = await client.GetByteArrayAsync(uri);
            var responseString = Encoding.GetEncoding("iso-8859-1").GetString(response, 0, response.Length - 1);
            return responseString;
        }

public static async Task GetPodcasts(string url)
        {
            var progrmas = await GetGroupAsync("prog");
            HttpClient client = new HttpClient();

            //Task<string> pedido = client.GetStringAsync(url);
            //string res = await pedido; //Gets the string with the wrong chars, LoadXml doesn't fails

            res = await GetResultsAsync(url); //Gets the string properly formatted
            XmlDocument doc = new XmlDocument();

            doc.LoadXml(res);  //Crashes here
            XmlElement root = doc.DocumentElement;

            XmlNodeList nodes = root.SelectNodes("//item");

            //Title
            var node_titles = root.SelectNodes("//item/title");
            IEnumerable<string> query_titles = from nodess in node_titles select nodess.InnerText;
            List<string> list_titles = query_titles.ToList();
            //........

            for (int i = 0; i < list_titles.Count; i++)
            {
                PodcastItem podcast = new PodcastItem();
                string title = list_titles[i];


                //First attempt to convert a field from the XmlDocument, with the wrong chars. Only replaces the bad encoding with a '?':

                //Encoding iso = Encoding.GetEncoding("ISO-8859-1");
                //Encoding utf8 = Encoding.UTF8;
                //byte[] utfBytes = utf8.GetBytes(title);
                //byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
                //string msg = iso.GetString(isoBytes, 0, isoBytes.Length - 1);

                PodcastItem dataItem = new PodcastItem(title + pubdate, title, link, description, "", pubdate);
                progrmas.Items.Add(dataItem);
            }

        }
1
What is title? It's really unclear what you're trying to do. Also note that XmlDocument and XDocument are different classes. If you've already converted the document into a string, it may be too late - you should give it in its original binary representation (e.g. as a Stream), and let the XML parser handle the decoding. - Jon Skeet
I have corrected the issues you mentioned. - cap7
And have you tried just giving the binary data to the XmlDocument? Does the XML file advertise an ISO-8859-1 encoding? (Is the document publicly accessible, so we could look for ourselves?) A short but complete program demonstrating the problem would really help. - Jon Skeet
Current code with the url for the document: pastebin.com/sPbxTShC - cap7
That's not a short but complete program, and you should include it in the question. - Jon Skeet

1 Answers

1
votes

I'm not sure why you try to fiddle with the encoding your self but the reason it crashes so badly on you is probably because you forgot to fetch the last byte of the array. This code works for me:

    static async Task<string> LoadDecoced()
    {
        var client = new HttpClient();
        var response = await client.GetByteArrayAsync("http://www.rtp.pt/play/podcast/469");
        var responseString = Encoding
           .GetEncoding("iso-8859-1")
           .GetString(response, 0, response.Length); // no -1 here, we want all bytes!
        return responseString;
    }

If I let the HttpClient figure it out your code works for me:

    static async Task<string> Load()
    {
        var hc = new HttpClient();
        string s = await hc.GetStringAsync("http://www.rtp.pt/play/podcast/469");
        return s;
    }

    static void Main(string[] args)
    {

        var xd = new XmlDocument();
        string res = Load().Result;
        xd.LoadXml(res);
        var node_titles = xd.DocumentElement.SelectNodes("//item/title");

        Console.WriteLine(node_titles.Count);
    }

If you are on a non-mobile/non-WinRT the XmlDocument.Load accepts a stream does the same:

    static async Task<Stream> LoadStream()
    {
        var hc = new HttpClient();
        var stream = await hc.GetStreamAsync("http://www.rtp.pt/play/podcast/469");
        return stream;
    }

    static void Main(string[] args)
    {

        var xd2 = new XmlDocument();
        xd2.Load(LoadStream().Result);

        var node_titles2 = xd2.DocumentElement.SelectNodes("//item/title");

        Console.WriteLine(node_titles2.Count);
    }

This is the result in my Console: Console output of encoded xml

Are you sure you are not encoding somewhere else as well?

As a general advice: The framework classes are capable of handling most common encoding scenario's. Try to let it work without having to fiddle with the Encoding classes.