1
votes

If someone can help me out with this I will reward you with baked goods!!

Saxon in .NET

I know how to run an XSLT transformation in streaming. Not a problem. What I'm trying to do now is just get a single node out of a stream that represents a huge xml document. I start with:

var xpath=@"/x/ns1:y/ns2:z";
var myStream = System.IO.File.OpenRead("c:\superHuge.xml");

XdmValue nodeZ=null;

/// now I need to find nodeZ by evaluating xpath over the XML
/// coming in over myStream

I know I could try and generate some kind of XSLT transformation on the fly, using the expression 'xpath' that would run against the stream and generate a result document that would contain the resultant node set. But for my implementation that's going to by really smelly. I need to be able to just start throwing a bunch of xpath expressions against the stream one after the other and get the resultant nodes.

Does anyone know how this can be done with Saxon EE? If it Can't is there another product that would support it?

1
I have not used that feature but I think if you don't need to open the stream yourself in your C# code but instead let Saxon do the work then you can use saxon:stream(doc("file:///C:/superhuge.xml")/x/ns1:y/ns2:z). See saxonica.com/documentation/index.html#!sourcedocs/streaming/… for details. I am also not sure whether that feature only works with XQuery or XSLT or whether you can use it with XPath as well. But whether you use XPath or have to use XQuery should not make much of a difference.Martin Honnen
I'm using a process where it will be a memory stream for smaller documents and a file stream for larger documents, so I need to operate against an abstract stream if possible.David Jessee

1 Answers

3
votes

You might find that the best way to tackle this is using Saxon's XQuery with document projection. Essentially this works by filtering the event stream from the XML parser and building a tree that only contains the nodes that contribute to the result of the query. XQuery works better than XSLT for this because it's more amenable to static analysis, as it lacks the polymorphism of XSLT's template rules.

By a strange coincidence, my colleague O'Neil Delpratt has been working on test cases for the .NET API and spotted today that there's no direct way of invoking XQuery document projection using that API. Take a look at it though, and try it from the command line or from Java; I'm sure it can be done in .NET, it might just require digging in a bit deeper than the public API.

To do the same thing with XSLT, I think you would have to generate a stylesheet programmatically. It wouldn't necessarily be very complicated: something like

<xsl:mode streamable="yes"/>

<xsl:template match="/">
  <xsl:copy-of select="---your path here---"/>
</xsl:template>