2
votes

I'm using Python lxml library to parse xml files. I need to validate a xml file against a xml schema. lxml supports XML schema validation, but the xml schema filepath/content must be provided (information available here: http://lxml.de/validation.html). However, I don't know the xml schema filepath beforehand, it should be parsed from the xml file header tags. I can't find a way to access these tags.

lxml supports this use case somehow?

1

1 Answers

2
votes

If the schema is linked using attributes on the root element, in the http://www.w3.org/2001/XMLSchema-instance namespace, you can retrieve these with lxml by prefixing the attribute name with the namespace URL in curly braces:

XMLSchemaNamespace = '{http://www.w3.org/2001/XMLSchema-instance}'
document = lxml.parse(xmlfile)
schemaLink = document.get(XMLSchemaNamespace + 'schemaLocation')
if schemaLink is None:
    schemaLink = document.get(XMLSchemaNamespace + 'noNamespaceSchemaLocation')

and then use a URL library to load the schema from the location referred. See lxml namespace handling for more information.