Wrapping text and elements in paragraph with XSLT 1.0

Question

I have a problem that I have solved using XSLT 2.0, but now I need to be able to do the same thing in XSLT 1.0 (because of a constraint to use an XSLT 1.0 compatible processor).

In fact, I have the need for different kinds of XHTML and XML, and possibly slightly different scenarios, but I'll give an example in XHTML for simplicity to find the general solution to this:

Say I have an HTML table like this:

<table frame="void">
        <col width="50%" />
        <col width="50%" />
        <thead>
            <tr>
                <th></th>
                <th></th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>text text <b>text</b> text <i>text</i> text</td>
                <td>text text <b>text</b> text <i>text</i> text<img src="mypic.png" alt="mypic"
                     />text <b>text</b> text</td>
            </tr>
            <tr>
                <td>
                    <table frame="void">
                        <col width="50%" />
                        <col width="50%" />
                        <thead>
                            <tr>
                                <th></th>
                                <th></th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td></td>
                                <td></td>
                            </tr>
                            <tr>
                                <td></td>
                                <td></td>
                            </tr>
                        </tbody>
                    </table>
                </td>
                <td><img src="mypic.png" alt="mypic" /></td>
            </tr>
            <tr>
                <td></td>
                <td></td>
            </tr>
        </tbody>
    </table>

And what I want now is to wrap all the text and inline elements in the table cells with <p> tags, to get this:

<table frame="void">
        <col width="50%" />
        <col width="50%" />
        <thead>
            <tr>
                <th></th>
                <th></th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>
                    <p>text text <b>text</b> text <i>text</i> text</p>
                </td>
                <td><p>text text <b>text</b> text <i>text</i> text</p><img src="mypic.png"
                        alt="mypic" /><p>text <b>text</b> text</p></td>
            </tr>
            <tr>
                <td>
                    <table frame="void">
                        <col width="50%" />
                        <col width="50%" />
                        <thead>
                            <tr>
                                <th></th>
                                <th></th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td></td>
                                <td></td>
                            </tr>
                            <tr>
                                <td></td>
                                <td></td>
                            </tr>
                        </tbody>
                    </table>
                </td>
                <td><img src="mypic.png" alt="mypic" /></td>
            </tr>
            <tr>
                <td></td>
                <td></td>
            </tr>
        </tbody>
    </table>

Note that a couple other cells have an image and a nested table. If there is only an image or a table in those cells (and no text or inline elements), they should not be wrapped in p tags.

Also note that one of the images have surrounding text and inline elements. In such a case the text and inline before the image should be wrapped in a p tag before the image (or table or whatever non-inline element it might be), and the text/inline after the image should be wrapped in another p tag. (In this use case img is considered a non-inline element btw).

When doing this in XSLT 2.0 I have used this template to handle this, calling it from the template for table cells instead of just applying child templates:

<xsl:template name="wrapInPara">
    <xsl:apply-templates select="@class"></xsl:apply-templates>
    <xsl:for-each-group select="node()[not(self::text() and normalize-space(.) = '')]"            
        group-adjacent="boolean(self::text() | self::e:b | self::e:i | self::e:em | self::e:strong | self::e:a | self::e:u | self::e:span)">
        <xsl:choose>
            <xsl:when test="current-grouping-key()">
                <p>                        
                    <xsl:apply-templates select="current-group()"/>
                </p>
            </xsl:when>
            <xsl:otherwise>
                <xsl:apply-templates select="current-group()"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:for-each-group>
</xsl:template>

And calling it something like this:

<xsl:template match="td">
    <xsl:copy>
        <xsl:call-template name="wrapInPara"/>
    </xsl:copy>
</xsl:template>

(As you can see there are more than the <b> and <i> tags that need to be considered, and it could be other tags than nested tables or images that should be excluded from the wrapping, so I'm hoping for an answer that can be modified for similar use cases if possible.

I have been trying to figure out how to do something like this in XSLT 1.0 looking at the Muenchian grouping method, which similar problems seem to suggest, but I can't get it to work.

Any help greatly appreciated!

Tim C Tim C · Accepted Answer · 2016-09-07T08:51:00

Here's one way you can do it. Effectively it groups the "para" element by the first preceding sibling that isn't a "para" element. The wrapInPara node then selects the non-"para" elements, and wraps any following "para" elements (using the key) in a p tag.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output method="xml" indent="yes"/>
    <xsl:strip-space elements="*" />

    <xsl:key name="para" 
             match="text()|b|i|em|strong|a|u|span" 
             use="concat(generate-id(..), '|', generate-id(preceding-sibling::node()[not((self::text()|self::b|self::i|self::em|self::strong|self::a|self::u|self::span))][1]))" />

    <xsl:template name="wrapInPara">
        <xsl:apply-templates select="@class" />
        <!-- Handle `para` elements that have no preceding non-para nodes -->
        <xsl:call-template name="groupInPara">
            <xsl:with-param name="group" select="key('para', concat(generate-id(), '|'))" />
        </xsl:call-template>
        <xsl:for-each select="node()[not((self::text()|self::b|self::i|self::em|self::strong|self::a|self::u|self::span))]">
            <xsl:apply-templates select="." />
            <!-- Wrap any following `para` elements -->
            <xsl:call-template name="groupInPara">
                <xsl:with-param name="group" select="key('para', concat(generate-id(..), '|', generate-id()))" />
            </xsl:call-template>
        </xsl:for-each>
    </xsl:template>

    <xsl:template name="groupInPara">
        <xsl:param name="group" />
        <xsl:if test="$group">
            <p>
                <xsl:apply-templates select="$group" />
            </p>
        </xsl:if>
    </xsl:template>

    <xsl:template match="td">
        <xsl:copy>
            <xsl:call-template name="wrapInPara"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Wrapping text and elements in paragraph with XSLT 1.0

1 Answers