3
votes

I want to validate large xml files by using xsd schemas in C#. For a file of 1000 lines of xml code validation takes a long time.

Are there any tips and tricks to validate faster?

Can you post some code examples that work faster with large xml validation?

Edit 1 : I validate like this Validating XML with XSD

Edit 2: For large files takes more than 10 seconds. And I need the validation to be very fast under a second.

Edit 3: File size is greater than 10 Mb

Edit 4: I am considering this approach too, I want to store xml file in database and xsd too.

2
how large is the file (bytes; "lines" is ambiguous), how long is it currently taking, and how are you currently doing it?Marc Gravell

2 Answers

4
votes

You are currently loading the entire document into memory, which is expensive regardless of validation. A better option is to just parse via a reader, i.e. as shown here on MSDN. The key points from the example on that page:

  • it never loads the entire document
  • the while(reader.Reader()) just enumerates the entire file at the node level
  • validation is enabled via the XmlReaderSettings
2
votes

It's reasonable to expect parsing a document with validation to take about twice as long as parsing without validation. But that ratio will vary a great deal depending on your schema. For example if every attribute is controlled by a regular expression, and the regex is complex, then the overhead of validation could be far higher than this rule-of-thumb suggests.

Also, this doesn't allow for the cost of building a complex schema. If you have a big schema defining hundreds of element types, compiling the schema could take longer than using it to validate a few megabytes of data.