7
votes

I was wondering if it is possible to use analyze-string and set multiple groups within the RegEx and then store all of the matching groups in variables to use later on.

like so:

<xsl:analyze-string regex="^Blah\s+(\d+)\s+Bloo\s+(\d+)\s+Blee" select=".">
  <xsl:matching-substring>
    <xsl:variable name="varX">
      <xsl:value-of select="regex-group(1)"/>
    </xsl:variable>                                
    <xsl:variable name="varY">
      <xsl:value-of select="regex-group(2)"/>
    </xsl:variable>        
  </xsl:matching-substring>
</xsl:analyze-string>    

This doesn't actually work, but that's the sort of thing I'm after, I know I can wrap the analyze-string in a variable, but that seems daft that for every group I have to process the RegEx, not very efficient, I should be able to process the regex once and store all of the groups for use later on.

Any ideas?

2

2 Answers

8
votes

Well does

<xsl:variable name="groups" as="element(group)*">
<xsl:analyze-string regex="^Blah\s+(\d+)\s+Bloo\s+(\d+)\s+Blee" select=".">
<xsl:matching-substring>
  <group>
     <x><xsl:value-of select="regex-group(1)"/></x>
     <y><xsl:value-of select="regex-group(2)"/></y>
  </group>   
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:variable>

help? That way you have a variable named groups which is a sequence of group elements with the captures.

6
votes

This transformation shows that xsl:analyze-string isn't necessary to obtain the wanted results -- a simpler and generic solution exists.:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="*[matches(., '^Blah\s+(\d+)\s+Bloo\s+(\d+)\s+Blee')]">

    <xsl:variable name="vTokens" select=
      "tokenize(replace(., '^Blah\s+(\d+)\s+Bloo\s+(\d+)\s+Blee', '$1 $2'), ' ')"/>

  <xsl:variable name="varX" select="$vTokens[1]"/>  
  <xsl:variable name="varY" select="$vTokens[2]"/>  

  <xsl:sequence select="$varX, $varY"/>
 </xsl:template>
</xsl:stylesheet>

when applied on this XML document:

<t>Blah  123   Bloo  4567  Blee</t>

which produces the wanted, correct result:

123 4567

Here we don't rely on knowing the RegEx (can be supplied as parameter) and the string -- we just replace the string with a delimited string of the RegEx groups, which we then tokenize and every item in the sequence produced by tokenize() can readily be assigned to a corresponding variable.

We don't have to find the wanted results buried in a temp. tree -- we just get them all in a result sequence.