6
votes

I am trying to define an XSD template for the following:

<template_data>
  <given_name lang="ENG">Zluty</given_name>
  <given_name lang="CES">Žlutý</given_name>
</template_data>

So far, I've come up with

<xs:complexType name="attribute_CES">
  <xs:attribute name="lang" type="xs:string" use="required" fixed="CES"/>
</xs:complexType>

<xs:complexType name="attribute_ENG">
  <xs:attribute name="lang" type="xs:string" use="required" fixed="ENG"/>
</xs:complexType>

<xs:element name="template_data">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="given_name" type="attribute_CES"/>
      <xs:element name="given_name" type="attribute_ENG"/>          
    </xs:sequence>
  </xs:complexType>
</xs:element>

Problem is, this defines an element with one and the same name two times, each time with a different type, to which any XSD validator I've found protests.

As far as I know, you can require an attribute to have a specific value with the fixed option, and that is included in the definition of a (complex) type. So if you want the attribute with a different value, you would have to define a new type.

What I need is the template_data to include both given_names, exactly once with lang="CES", and exactly once with lang="ENG". Is there a way to write an xsd validation schema for that, or is that impossible (for example if the xml input doesn't conform to standards)?

1
This is not possible with XSD since this means validating the content - XSD can only validate the schema. You'll need something like Schematron to achieve what you need.Filburt
Really? I've seen some basic content validation with XSD, using restriction (w3schools.com/schema/schema_facets.asp) and with fixed in attributes (w3schools.com/schema/schema_simple_attributes.asp), or with types.Humungus

1 Answers

7
votes

You can't declare two elements with the same name with different types in the same context, but I think I understand what you want to do.

If you really had elements with very different contents, it would make sense to create two types (and it would also make sense for them to have different names or to at least occur in another context). Since your data is similar, and the main difference is an attribute which describes the text content of the element, you can create one type and restrict the values the attribute can receive:

<xs:complexType name="languageType">
    <xs:simpleContent>
        <xs:extension base="xs:string">
            <xs:attribute name="lang" use="required">
                <xs:simpleType>
                    <xs:restriction base="xs:NMTOKEN">
                        <xs:enumeration value="ENG"/>
                        <xs:enumeration value="CES"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute>
        </xs:extension>
    </xs:simpleContent>
</xs:complexType>

In languageType above you have simple content (xs:string) and a required lang attribute which can only have two values: ENG or CES.

If you want to guarantee that there are exactly two elements, you can restrict that in your template_data element definition with minOccurs="2" and maxOccurs="2" for the given_name child element:

<xs:element name="template_data">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="given_name" type="languageType" minOccurs="2" maxOccurs="2"/>        
        </xs:sequence>
    </xs:complexType>
    ...

Now it is still possible to have two given_name elements with the same lang="ENG" attribute. To restrict that we can add a xs:key definition in the context of template_data element definition:

<xs:element name="template_data">
    <xs:complexType> ... </xs:complexType>
    <xs:key name="languageKey">
        <xs:selector xpath="given_name" />
        <xs:field xpath="@lang"/>
    </xs:key>
</xs:element>

The xs:key uses the nested given_name as a selector and its lang attribute as the key field. It won't allow duplicate fields, that means it will not allow two given_name elements with the same lang atrributes. Since you only allow two, and they can only be ENG or CES, one has to be ENG, and the other CES.

Now these XML document validate:

<template_data>
    <given_name lang="ENG">Zluty</given_name>
    <given_name lang="CES">Žlutý</given_name>
</template_data>

<template_data>
    <given_name lang="CES">Žlutý</given_name>
    <given_name lang="ENG">Zluty</given_name>
</template_data>

But these don't:

<template_data>
    <given_name lang="FRA">Zluty</given_name>
    <given_name lang="CES">Žlutý</given_name>
    <given_name lang="ENG">Zluty</given_name>
</template_data>

<template_data>
    <given_name lang="ENG">Zluty</given_name>
    <given_name lang="ENG">Zluty</given_name>
</template_data>

<template_data>
    <given_name lang="ENG">Zluty</given_name>
</template_data>

<template_data>
    <given_name>Zluty</given_name>
    <given_name lang="ENG">Zluty</given_name>
</template_data>