1
votes

I am facing a problem while reading xml in C# linq to xml.

When I try to read xml document by using following statement:

XDocument xdoc = XDocument.Load(path);

It throws an exception like this.

Data at the root level is invalid. Line 1, position 1.

When I open the xml file that I was trying to read, I found an invalid character before xml declaration. Here is the declaration:

?<?xml version="1.0" encoding="utf-8"?>

I know the question mark at the start of declarations shouldn't be there.

I have three questions

1) How to read this invalid xml in C# linq to xml?

2) How to remove such kind of invalid characters any where in the xml in C#?

3) How to prevent these kind of invalid characters while creating the xml in c# linq to xml?

xml sample: ?<?

hex equivalent : 3f 3c 3f

And here is the code that I am using to create it:

XDocument xdoc = new XDocument();
xdoc.Add(new XElement("TaskAlert"));
AddParentNodeInTaskAlertXml(ref xdoc, userId);
and so on......

I couldn't understand the reason why it add such kind of characters sometime.

Here is some code that I am using to create or load the file:

public static void CreateUpdateTaskAlertXmlFile(int userId)
        {
            try
            {
                string path = string.Format("{0}\\{1}\\{2}", Application.StartupPath, "Configuration",
                                            "TaskAlert.xml");
                if (userId.Equals(0))
                    userId = Utility.Application.CurrentUser.UserId;

                XDocument xdoc;
                LoadTaskAlertXml(out xdoc, path, userId);
                xdoc.Save(path);
            }
            catch (Exception exception)
            {
                MSLib.HandleException(exception);
            }
        }

        public static void LoadTaskAlertXml(out XDocument xdoc, string path, int userId)
        {
            xdoc = null;
            TaskCollection tasks = TaskEntity.GetOverDueTasks(userId);
            if (!File.Exists(path))
            {
                CreateTaskAlertXml(userId.ToString(), ref xdoc);
                AddOverDueTasksInTaskAlertXml(xdoc, userId.ToString(), tasks, false);
            }
            else
            {
                xdoc = XDocument.Load(path);

                XElement userElement =
                    xdoc.Descendants("User").Where(x => x.Attribute("Id").Value.Equals(userId.ToString())).
                        SingleOrDefault();

                if (userElement == null)
                {
                    AddParentNodeInTaskAlertXml(ref xdoc, userId.ToString());
                    AddOverDueTasksInTaskAlertXml(xdoc, userId.ToString(), tasks, false);
                }
                else
                    AddOverDueTasksInTaskAlertXml(xdoc, userId.ToString(), tasks, true);
            }
        }
1
@Naveed: As Jon already said, please provide us with the hex value of the first character.Daniel Hilgarth
hex value of first character is 3fNaveed Anjum
Is this a Unicode byte-order mark, then converted as unknown character back to ASCII encoding -- which then gets turned into "?" This happens quite often when you're simplistically converting Unicode texts into ASCII format by converting each character.Stephen Chung
Thanks Stephen Chung. I am simply creating the file using code that is mentioned in my post. I am not mentioning any encoding format while creating the file. By default it is created with utf-8 encoding.Naveed Anjum

1 Answers

3
votes

LINQ to XML wouldn't create an invalid file to start with, so question 3 is moot.

LINQ to XML is only designed to read valid XML. You should find out why you've ended up with invalid XML to start with, and fix the root cause. It's generally a bad idea to try to fix an already-invalid file, especially without understanding the root cause to start with - you never know what other problems might be lurking round the corner.

I suspect that the extra character was originally a byte order mark, but that it's been mangled by something else. If you can give us more information about how you've created the file in the first place, that would help a lot. LINQ to XML can read files which start with a valid BOM with no problems.

I suggest you look at the file in a binary editor and edit your question with exactly the bytes at the start of the file. A valid UTF-8 BOM would be 0xEF, 0xBB, 0xBF.

EDIT: It sounds like the bug is in the way you're creating the file. For example, this should be absolutely fine:

using System.Xml.Linq;

class Test
{
    static void Main()
    {
        XDocument doc = new XDocument();
        doc.Add(new XElement("Test"));
        doc.Save("test.xml");
    }
}

That creates a file with a valid byte order mark. Please show an equivalent program which doesn't, or investigate exactly what you're doing with the file, e.g. copying via FTP.

As an aside, do you really need to use ref in your call to AddParentNodeInTaskAlertXml? It seems unlikely to me. See my parameter passing article if you're not quite sure what ref really means.