0
votes

In attempting to solve a separate issue, I switched to the Saxon XSLT processor and have been struggling to get the syntax of my code to work. The purpose of the code is to iterate through a list of HTML files, finding the first instance of any header in each page and converting it to an H1 (since we have to use H2s for our PDF output but need H1s for our HTML output).

I start with a batch file:

set outputDir=%1
@set Saxon=C:\Users\%username%\saxon\saxon9he.jar

REM Create filelist
dir %outputDir%\*.htm /b /s /A-D > file_list.txt
@echo ^<filelist^>^</filelist^> > pre_filelist.xml

REM XML-ize filelist
java -cp %Saxon% net.sf.saxon.Transform -s:pre_filelist.xml -xsl:convert_filelist.xsl -o:pre_list.xml

REM Replace starting h2 tags with h1 tags
java -cp %Saxon% net.sf.saxon.Transform -s:pre_list.xml -xsl:h2toh1.xsl -o:null.xml

REM Garbage collection
DEL pre_list.xml
DEL pre_filelist.xml
DEL file_list.txt

pause

Which finds all of the output HTML files and formats them in a list using convert_filelist.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<!-- Set output style. XML with no indentations -->
    <xsl:output indent="no" method="xml" omit-xml-declaration="yes"/>

<!-- Reads the file list text file into memory as a global variable. -->
    <xsl:variable name="fileList">file_list.txt</xsl:variable>  

<!-- Parses the file list text file to create an XML list of files that can be fed to the transformer -->
    <xsl:template match="filelist">
    <!-- Create a variable that can be parsed -->
        <xsl:variable name="filelist_raw"><xsl:value-of select="unparsed-text($fileList,'UTF-8')"/></xsl:variable>
    <!-- Create a open and close file tags for each line in the list -->
        <xsl:variable name="driveLetter"><xsl:value-of select="substring-before(unparsed-text($fileList,'UTF-8'),':')"/>:<xsl:text disable-output-escaping="yes">\\</xsl:text></xsl:variable>
        <xsl:variable name="driveLetterReplacement"><xsl:text disable-output-escaping="yes">&lt;file&gt;file:///</xsl:text><xsl:value-of select="$driveLetter"/></xsl:variable>
    <!-- Generate an xml tree. The value-of is doing a text-level replacement. Looking for the drive letter and replacing it  -->
    <!-- with the file open tag and drive letter. Looking for the file extension and replacing with the extension and file close tag. -->
        <file_list><xsl:value-of select="replace(replace(replace($filelist_raw,'.htm','.htm&lt;/file&gt;'),$driveLetter,$driveLetterReplacement),'\\','/')" disable-output-escaping="yes"/></file_list>
    </xsl:template>
</xsl:stylesheet>

And which then converts the first header to an H1 using h2toh1.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
    xmlns:MadCap="http://www.madcapsoftware.com/Schemas/MadCap.xsd" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<!-- Set output style. XML with no indentations. Normally no. -->
    <xsl:output method="xml" indent="yes" omit-xml-declaration="no"/>

<!-- Begin traversing the list of files in the output folder. -->
    <xsl:template match="file_list">
        <xsl:for-each select="*">
            <xsl:variable name="filename" select="."/>
            <xsl:variable name="content" select="document($filename)"/>

<!-- Generate a new output file to replace the Flare generated file. Uses the same file name. Transparent to the end user. -->
            <xsl:result-document href="{$filename}" method="xml">
                <xsl:apply-templates select="document($filename)">
                    <xsl:with-param name="content" select="$content"/>
                </xsl:apply-templates>
            </xsl:result-document>

        </xsl:for-each>
    </xsl:template>

<!-- Recreate each node as it appears in the generated document -->
    <xsl:template match="*">
        <xsl:param name="content"/>
        <xsl:variable name="name" select="name(.)"/>
        <xsl:element name="{$name}">
            <xsl:for-each select="@*">
                <xsl:copy-of select="."/>
            </xsl:for-each>
            <xsl:apply-templates/>
            </xsl:element>
    </xsl:template>

<!-- Select the first header and change it to an h1. -->
    <xsl:template match="*[matches(name(), 'h\d')][1]">
        <xsl:element name="h1">
            <xsl:for-each select="@*|node()">
                <xsl:copy-of select="."/>
            </xsl:for-each>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

However, I then receive an instance of these errors for each file in the list:

Warning at char 9 in xsl:variable/@select on line 13 column 63 of h2toh1.xsl: XTRE1500: Cannot read a document that was written during the same transformation: file:///C:/TechDocs/Projects/ScriptTest/Output/JPittman/Docs11/Default.htm

Warning at char 9 in xsl:apply-templates/@select on line 17 column 55 of h2toh1.xsl: XTRE1500: Cannot read a document that was written during the same transformation: file:///C:/TechDocs/Projects/ScriptTest/Output/JPittman/Docs11/Default.htm

I understand the cause of the issue, but I can't understand how to get around it. I also attempted to use the collection function, since rewriting every page seemed clunky anyway, but I don't understand how to implement that. Any help?

1
As a side note, I'm aware that I need to spend some time investigating disable-output-escaping but am just trying to get the code to work first.Jenny Pittman
"Create a open and close file tags" is a big warning in XSLT...Alejandro
Which version of Saxon do you use? That error code seems to be from the XSLT 2 spec while the current Saxon version 9.9 is an XSLT 3 processor where I would expect a different error code (w3.org/TR/xslt-30/#result-document-restrictions). But expecting to be able to read and "then" write the same file is not something XSLT supports. So you should write to a different file and use other tools to copy/move/overwrite the input with the output.Martin Honnen
Or you try to exploit the wording "if the same absolute URI is used to access the resource" and try to read and write the same file but with different absolute URIs.Martin Honnen
@JennyPittman if you spend some time investigating disable-output-escaping then you will discover that using it is nearly always (a) unnecessary, and (b) a really bad idea. It's a feature that appeals to beginners who don't know how to use XSLT properly.Michael Kay

1 Answers

1
votes

The reason the error is defined in the spec is that order of execution is not defined, so if you read and write the same file in the transformation then in principle there is no way of predicting whether the read is done before the write, or afterwards. (Of course, in practice, that's often not true, because there will be a functional dependency.)

You can usually work around the restriction, at your own risk, by using subtly different URLs for the read and the write. For example, query parameters at the end of the URI (like ?version=1) will usually be ignored on file:/// URIs.