0
votes

How do I transform XHTML from XML Node and then call templates as if the XML Node is the parent of the XHTML.

Referencing my code below, I want to take the values from group/Clob/text() converted as XHTML/XML and continue to process those new nodes against templates .. such as the center template.

This will allow me to achieve my overall goal of transforming XML from a dataset, which contains Clob values from a rich text editor and then creates an XSL:FO file which is then used to create a PDF.

The details below should be enough to illustrate what I've tried accomplishing.

<!--XML-->
<root>

  <group>
    <key>16692504</key>
    <Clob>&lt;body&gt;Testing se&lt;font color="#99cc00"&gt;co&lt;/font&gt;nd o&lt;font color="#99cc00" style="background-color: #000000;"&gt;bser&lt;/font&gt;vation&lt;/body&gt;</Clob>
  </group>
  <group>
    <key>16692508</key>
    <Clob>&lt;body&gt;Testing se&lt;font color="#99cc00"&gt;co&lt;/font&gt;nd o&lt;font color="#99cc00" style="background-color: #000000;"&gt;bser&lt;/font&gt;vation&lt;/body&gt;</Clob>
  </group>

</root>

Here is the XSL file that I'm using to convert the escaped contents within the Clob node back to it's original format: HTML/XHTML.

<!--xsl-->
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" version="1.0">

<xsl:template name="group"> 
    <xsl:for-each select="//group">

        <xsl:apply-templates mode="unescaped" select="Clob"/>

    </xsl:for-each>
</xsl:template>

<xsl:template match="/">
    <xsl:copy>
      <xsl:apply-templates select="self::node()/*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="text()">
    <xsl:variable name="escaped-text">
      <xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="."/>
        <xsl:with-param name="replace" select="'&quot;'"/>
        <xsl:with-param name="with" select="'\&quot;'"/>
      </xsl:call-template>
    </xsl:variable>
    <xsl:choose>
        <xsl:when test="parent::*[name() = 'Clob']"> <!-- Converts escaped characters in this node back to XHTML/XML -->
        <xsl:value-of disable-output-escaping="yes" select="normalize-space(.)"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of disable-output-escaping="no" select="normalize-space(.)"/> <!-- Preserves escaping for non essential fields -->
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template name="replace-string">
    <xsl:param name="text"/>
    <xsl:param name="replace"/>
    <xsl:param name="with"/>
    <xsl:choose>
      <xsl:when test="contains($text,$replace)">
        <xsl:value-of select="substring-before($text,$replace)"/>
        <xsl:value-of select="$with"/>
        <xsl:call-template name="replace-string">
          <xsl:with-param name="text" select="substring-after($text,$replace)"/>
          <xsl:with-param name="replace" select="$replace"/>
          <xsl:with-param name="with" select="$with"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:choose>
         <xsl:when test="parent::*[name() = 'Clob']">
            <xsl:value-of disable-output-escaping="yes" select="$text"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of disable-output-escaping="no" select="$text"/>
          </xsl:otherwise>
        </xsl:choose>
     </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

   <xsl:template match="body">
    <fo:block text-align="body">
      <xsl:apply-templates select="*|text()"/>
    </fo:block>
  </xsl:template>  

</xsl:stylesheet>

When I get the XML back into the output below, I want call templates for (including it's attributes) and ultimately return the context from those templates.

<!-- Desired XML Output for further processing -->
<root>

  <group>
    <key>16692504</key>
    <Clob><![CDATA[<body><h1>Testing<br /><br /><font color="#00ff00">testing</font><br /><br /><font color="#ffff00">testing</font></h1></body>]]></Clob>
  </group>
  <group>
    <key>16692508</key>
    <Clob><body>Testing se<font color="#99cc00">co</font>nd o<font color="#99cc00" style="background-color: #000000;">bser</font>vation</body>]]></Clob>
  </group>

</root>

In the end, this will help me generate XSL:FO templates which are then used to create PDF files. This will give me the ability to provide HTML rich text fields formatted specially for XSL:FO to rend the HTML to it's equivalent.

I am using a XSL:FO stylesheet and guide created by Doug Tidwell at IBM that has added to the hurdle of calling it as a template from my main XSL:Stylesheet. Here are the details for that stylesheet.

https://www.ibm.com/developerworks/library/x-xslfo2app/index.html#artdownload

1
So which XSLT processor do you use? You have tagged the question with both xslt-1.0 and xslt-2.0 and you haven't explained which XSLT processor you use. If you use an XSLT 2 processor then there is always github.com/davidcarlisle/web-xslt/blob/master/htmlparse/… to parse (X)HTML, even if the XSLT processor doesn't provide an extension to do that. - Martin Honnen
@MartinHonnen is this possible with xslt-1.0? I see you answered to someone's solution using xslt-3.0 and hinted at how it could be done in earlier versions. stackoverflow.com/questions/52535431/… - Krptodr
I am not aware of anyone having done an HTML parser implementation in XSLT 1; of course many XSLT 1 processors have their proprietary way of allowing the use of extension functions written in the programming language or platform they are implemented in (e.g. a .NET XSLT 1 processor like XslCompiledTransform allows access to other .NET APIs like for instance the HTML Agility Pack to parse HTML, an XSLT 1 processor implemented in Java can often easily access HTML parser APIs done in Java). But as for both and .NET and Java there is an XSLT 3 processor with Saxon 9 I don't tend to use XSLT 1. - Martin Honnen
@MartinHonnen is it appropriate to say I need an HtmlParser? Is that the only way to turn the value of an element into XML and run templates off of it? - Krptodr
Well, XSLT/XPath 3 have tools for that with parse-xml, earlier versions rely on extension functions like e.g. saxonica.com/html/documentation9.6/functions/saxon/parse.html, for your XSLT 1 processor you will need to find out whether it offers some extension or easily allows you to call into some XML parsing API. In pure XSLT 1 you would need to use two separate stylesheets where the second processes the serialized output of the first, in the first you could then use disable-output-escaping. - Martin Honnen

1 Answers

0
votes

XslCompiledTransform allows you to use extension "script" in C# or VB.NET with the msxsl:script element or extension objects passed with https://docs.microsoft.com/en-us/dotnet/api/system.xml.xsl.xsltargumentlist.addextensionobject?view=netframework-4.7.2#System_Xml_Xsl_XsltArgumentList_AddExtensionObject_System_String_System_Object_ so in theory you can easily write a function in C# that takes a string argument, feeds it to an XPathDocument over a StringReader of the string argument (https://docs.microsoft.com/en-us/dotnet/api/system.xml.xpath.xpathdocument.-ctor?view=netframework-4.7.2#System_Xml_XPath_XPathDocument__ctor_System_IO_TextReader_) and returns CreateNavigator() on the XPathDocument from the C# function (i.e. an XPathNavigator). Then XSLT can call that function and use normal XPath navigation or template processing, like with any other input document.

Whether your setup allows you to enable "script" processing on XslCompiledTransform (https://docs.microsoft.com/en-us/dotnet/api/system.xml.xsl.xsltsettings.enablescript?view=netframework-4.7.2#System_Xml_Xsl_XsltSettings_EnableScript) or allows you to set up an extension object on an XsltArgumentList you pass to the Transform method of XslCompiledTransform I can't tell, but the C# code is rather simple

public XPathNavigator ParseXml(string xmlCode)
{
  using (StringReader sr = new StringReader(xmlCode))
  {
    return new XPathDocument(sr).CreateNavigator();
  }
}

XPathNavigator even allows you to parse fragments if needed if you use an appropriate XmlReader over the StringReader, so implementing something along the lines of the XPath 3.1 parse-xml-fragment would also be possible.