0
votes

I have to parse a XML where the xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" namespace is missing, so the xml looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<program>
  <scriptList>
  <script type="StartScript">
    <isUserScript>false</isUserScript>
  </script>
  </scriptList>
</program>

but should look like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<program xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
  <scriptList>
    <script xsi:type="StartScript">
      <isUserScript>false</isUserScript>
    </script>
  </scriptList>
</program>

The type attribute ensures the correct subclass e.g.

class StartScript : script
{...}

The parser is auto generated from an handwritten xsd via $> xsd.exe a.xsd /classes (.Net). Here is the xsd:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
elementFormDefault="qualified" attributeFormDefault="qualified">

  <!-- Main element -->
  <xs:element name="program">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="scriptList">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="script" type="script" maxOccurs="unbounded"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:complexType name="script" />

  <xs:complexType name="StartScript">
    <xs:complexContent>
      <xs:extension base="script">
        <xs:all>
          <xs:element name="isUserScript" type="xs:boolean"></xs:element>
        </xs:all>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

A simple solution is to run a string-replace (" type=\"" to " xsi:type=\"") on the input XML but this is pretty ugly. Is there a better solution?

1
How are you parsing it? Are you using XmlSerializer, or something else? Are there other types of script or only StartScript?dbc
Parsing is simple in C#: _program = (new XmlSerializer(typeof(program))).Deserialize(f) as program; There are a lot of scripts, StartScript s just one exampleHarald

1 Answers

1
votes

You can load your XML into an intermediate LINQ to XML XDocument, fix the attribute namespaces on the <script> elements, then deserialize directly to your final class:

// Load to intermediate XDocument
XDocument xDoc;
using (var reader = XmlReader.Create(f))
    xDoc = XDocument.Load(reader);

// Fix namespace of "type" attributes
XNamespace xsi = "http://www.w3.org/2001/XMLSchema-instance";
foreach (var element in xDoc.Descendants("script"))
{
    var attr = element.Attribute("type");
    if (attr == null)
        continue;
    var newAttr = new XAttribute(xsi + attr.Name.LocalName, attr.Value);
    attr.Remove();
    element.Add(newAttr);
}

// Deserialize directly to final class.
var program = xDoc.Deserialize<program>();

Using the extension method:

public static class XObjectExtensions
{
    public static T Deserialize<T>(this XContainer element, XmlSerializer serializer = null)
    {
        if (element == null)
            throw new ArgumentNullException();
        using (var reader = element.CreateReader())
            return (T)(serializer ?? new XmlSerializer(typeof(T))).Deserialize(reader);
    }
}