1
votes

My data class that will be serialized into XML look like this:

[XmlType(TypeName = "SPCFileInfo")]
[Serializable]
public class SPCFileInfoProtocol
{
    [XmlElement("CompanyCode")]
    public string CompanyCode { get; set; }
    [XmlElement("FileName")]
    public string FileName { get; set; }
    [XmlElement("FileVer")]
    public int FileVer { get; set; }
    [XmlElement("FileSize")]
    public long FileSize { get; set; }
    [XmlElement("CreatedOn")]
    public DateTime CreatedOn { get; set; }
    [XmlElement("LastUpdatedOn")]
    public DateTime LastUpdatedOn { get; set; }
    [XmlElement("FileBytes")]
    public byte[] FileBytes { get; set; }
}

And here's my serialization utiltiy class

public static class XmlSerializer
{
    public static string SerializeToString<T>(T item)
    {
        if (item == null)
        {
            return null;
        }

        System.Xml.Serialization.XmlSerializer serializer = new System.Xml.Serialization.XmlSerializer(typeof(T));

        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Encoding = new UnicodeEncoding(false, false); // no BOM in a .NET string
        settings.Indent = false;
        settings.OmitXmlDeclaration = false;

        using (StringWriter textWriter = new StringWriter())
        {
            using (XmlWriter xmlWriter = XmlWriter.Create(textWriter, settings))
            {
                serializer.Serialize(xmlWriter, item);
            }
            return textWriter.ToString();
        }
    }

    public static T DeserializeFromString<T>(string xmlString)
    {
        T item = default(T);

        try
        {
            using (StringReader stringReader = new StringReader(xmlString))
            {
                System.Xml.Serialization.XmlSerializer xmlSerializer =
                new System.Xml.Serialization.XmlSerializer(typeof(T));
                item = (T)xmlSerializer.Deserialize(stringReader);
            }
        }
        catch (Exception ex)
        {
            Trace.WriteLine(ex.ToString());
        }

        return item;
    }
}

Serialization into XML works fine, but when I attempt to deserialize, I get the following exception:

XMLException: There is an error in XML document. hexadecimal value 0x00, is an invalid character.

Upon investigation, I found out that certain character codes are not valid for XML document. Removing invalid characters is not an option since they constitute the bytes for a file.

My question is how do you serialize/deserialize a data class like above into XML without stripping invalid bytes? If this is not possible, what are some viable alternatives?

Edit: Upon request, here's the full stacktrace of the error

System.InvalidOperationException: There is an error in XML document (1, 21933). ---> System.Xml.XmlException: '.', hexadecimal value 0x00, is an invalid character. Line 1, position 21933. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args) at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars) at System.Xml.XmlTextReaderImpl.ParseText()
at System.Xml.XmlTextReaderImpl.ParseElementContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.XmlTextReader.Read() at System.Xml.XmlReader.ReadElementString() at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderSPCCommandProtocol.Read2_SPCCommandProtocol(Boolean isNullable, Boolean checkType) at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderSPCCommandProtocol.Read3_SPCCommand() --- End of inner exception stack trace --- at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader) at NextSPCFileUpdater.Utilities.XmlSerializer.DeserializeFromString[T](String xmlString) in C:\Source Codes\SPC\nextspc-fileupdater\NextSPCFileUpdater\Utilities\XmlSerializer.cs:line 48

And here's the new version of deserialization

public static T DeserializeFromString<T>(string xmlString)
{
    T item = default(T);

    try
    {
        using (StringReader stringReader = new StringReader(xmlString))
        using (XmlTextReader xmlTextReader = new XmlTextReader(stringReader) { Normalization = false })
        {
            System.Xml.Serialization.XmlSerializer xmlSerializer =
            new System.Xml.Serialization.XmlSerializer(typeof(T));
            item = (T)xmlSerializer.Deserialize(xmlTextReader);
        }
    }
    catch (Exception ex)
    {
        Trace.WriteLine(ex.ToString());
    }

    return item;
}
1

1 Answers

2
votes

As you've noticed, there are lots of characters that may not be present in an XML document. These can be included in your data, however, using the proper escape sequence.

The default settings of the XmlTextReader cause it to mishandle this -- I think it interprets the escape sequences prematurely, but I'm not precisely certain. If I recall correctly, the XmlSerializer will create an XmlTextReader to wrap the TextReader you pass it. To override that, you need to create one yourself, setting its Normalization property of the XmlTextReader to false.

Regardless of whether my recollection of the causes of the problem is correct, however, setting Normalization to false will solve your problem:

var xmlReader = new XmlTextReader(textReader) { Normalization = false };

Or rather, in your case:

using (StringReader stringReader = new StringReader(xmlString))
using (XmlTextReader xmlTextReader = new XmlTextReader(stringReader) { Normalization = false })
{
    System.Xml.Serialization.XmlSerializer xmlSerializer =
    new System.Xml.Serialization.XmlSerializer(typeof(T));
    item = (T)xmlSerializer.Deserialize(xmlTextReader);
}

As an aside, most will find your code far more readable if you use some using directives:

using System.Xml;
using System.Xml.Serialization;

using (StringReader stringReader = new StringReader(xmlString))
using (XmlTextReader xmlTextReader = new XmlTextReader(stringReader) { Normalization = false })
{
    XmlSerializer xmlSerializer = new XmlSerializer(typeof(T));
    item = (T)xmlSerializer.Deserialize(xmlTextReader);
}

Still more will find it more readable if you use var (though I have at least one colleague who disagrees):

using System.Xml;
using System.Xml.Serialization;

using (var stringReader = new StringReader(xmlString))
using (var xmlTextReader = new XmlTextReader(stringReader) { Normalization = false })
{
    var xmlSerializer = new XmlSerializer(typeof(T));
    item = (T)xmlSerializer.Deserialize(xmlTextReader);
}