2
votes

I'm having problems getting lxml to successfully validate some xml. The XSD schema and XML file are both from Amazon documentation so should be compatible. But the XML itself refers to another schema that's not being loaded.

Here is my code, which is based on the lxml validation tutorial:

xsd_doc = etree.parse('ProductImage.xsd')
xsd = etree.XMLSchema(xsd_doc)
xml = etree.parse('ProductImage_sample.xml')
xsd.validate(xml)
print xsd.error_log

"ProductImage_sample.xml:2:0:ERROR:SCHEMASV:SCHEMAV_CVC_ELT_1: Element 'AmazonEnvelope': No matching global declaration available for the validation root."

I get no errors if I validate against amzn-envelope.xsd instead of ProductImage.xsd, but that defeats the point of seeing if a given Image feed is valid. All xsd & xml files mentioned are in my working directory along with my python script by the way.

Here is a snippet of the sample xml, which should definately be valid:

<?xml version="1.0"?>
<AmazonEnvelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="amzn-envelope.xsd">
    <Header>
        <DocumentVersion>1.01</DocumentVersion>
        <MerchantIdentifier>Q_M_STORE_123</MerchantIdentifier>
    </Header>
    <MessageType>ProductImage</MessageType>
    <Message>
        <MessageID>1</MessageID>
        <OperationType>Update</OperationType>
        <ProductImage>
            <SKU>1234</SKU>

Here is a snippet of the schema (this file is not public so I can't show all of it):

<?xml version="1.0"?>
<!-- Revision="$Revision: #5 $" -->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
    <xsd:include schemaLocation="amzn-base.xsd"/>
    <xsd:element name="ProductImage">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element ref="SKU"/>

I can say that following the include to amzn-base.xsd does not end up reaching a definition of the AmazonEnvelope tag. So my questions is: can lxml load schemas via a tag like <AmazonEnvelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="amzn-envelope.xsd">. And if not, how can I validate my Image feed?

1
lxml doesn't handle xsi attributes dealing with schema location. What I can't seem to correlate is what node from the sample XML would match the ProductImage element definition in the XSD; in other words, do you get a tag <ProductImage/> somewhere in your sample XML? - Petru Gardea
Yes, there is the <ProductImage> tag as you can see and as the indentation implies, that closes after including some subelements starting with SKU. - Tom Viner
It turns out that the SchemaLocation="amzn-envelope.xsd" is really a hint to me to validate by that file, as I found it does have includes for all the sub schemas including ProductImage.xsd. - Tom Viner
Must've been blind; since schema location doesn't work, and if you want to validate just the ProductImage node, then I would create an ElementTree from the ProductImage node (use XPath to get to it) and validate that; it should work, according to lxml.de/api.html#trees-and-documents, - Petru Gardea
cool, didn't realise you could validate part of a document. - Tom Viner

1 Answers

2
votes

The answer is I should validate by the parent schema file, which as mentioned at the top of the XML file is amzn-envelope.xsd as this contains the line:

<xsd:include schemaLocation="ProductImage.xsd"/>

In general then, lxml won't read such a declaration as xsi:noNamespaceSchemaLocation="amzn-envelope.xsd" but if you can find the parent schema to validate against then this should hopefully include the specific schema you're interested in.