This is an example of XML output I need to parse and validate against the schema xsd files.
<Record_Delimiter DocumentID="1.1" DocumentType="PARENT" DocumentName="SCHOOL" RelatedDocumentID=""/>
<xs:SCHOOL>
<xs:Name>some name</xs:Name>
<xs:ID>5908390481</xs:ID>
<xs:Address>some address</xs:Address>
</xs:SCHOOL>
<Record_Delimiter DocumentID="1.2" DocumentType="CHILD" DocumentName="STUDENTEXP" RelatedDocumentID="1.1"/>
<xs:STUDENTEXP>
<xs:STUDENT>
<xs:Name>some name</xs:Name>
<xs:SID>s1036456</xs:SID>
<xs:Age>12</xs:Age>
<xs:Address>some address</xs:Address>
<xs:Expenses>
<xs:Fees>800</xs:Fees>
<xs:Books>100</xs:Books>
<xs:Uniform>50</xs:Uniform>
<xs:Transport>10</xs:Transport>
</xs:Expenses>
</xs:STUDENT>
</xs:STUDENTEXP>
<Record_Delimiter DocumentID="1.3" DocumentType="CHILD" DocumentName="STUDENTEXP" RelatedDocumentID="1.1"/>
<xs:STUDENTEXP>
<xs:STUDENT>
<xs:Name>some name</xs:Name>
<xs:SID>s1036789</xs:SID>
<xs:Age>15</xs:Age>
<xs:Address>some address</xs:Address>
<xs:Expenses>
<xs:Fees>1000</xs:Fees>
<xs:Books>200</xs:Books>
<xs:Uniform>50</xs:Uniform>
<xs:Transport>10</xs:Transport>
</xs:Expenses>
</xs:STUDENT>
</xs:STUDENTEXP>
This file itself is not valid XML because there is no single tag wrapping all the other tags. But each record (ie, SCHOOL and STUDENTEXP)is valid XML and it validates against the schema (school.xsd, studentexp.xsd).
I never worked with this format and not sure about few things, like how to parse such a file programmatically? Normally using lxml, we can validate each record if it was in a separate file:
xmlschema = etree.XMLSchema(etree.parse('./studentexp.xsd'))
xmlschema.assertValidate(etree.parse('./sampleStudentexp.xml'))
What is the proper way to extract the "records" and validate them separately?