My XPATH expression is selecting an incorrect node set

Question

I'm trying to teach myself XSL and XPATH. I have a sample XML document created by one of our commercial tools, and I want to extract certain node values and create a CSV file as output. A truncated example of the XML document is here:

<?xml version="1.0" encoding="windows-1252"?>
<xml_report> 
  <form id= "WOI:WorkOrder" xmlns="http://www.w3.org/2000/xforms">
     <mode l>
        < group name="field-info" minOccurs="1" maxOccurs="1">
            <group name="field" minOccurs="1" maxOccurs="*">
               <string name="name" />
               <number name="id" long="true" />
               <string name="type" range="closed">
                  <value>CHAR</value>
                  <value>TIME</value>
                  <value>DECIMAL</value>
                  <value>REAL</value>
                  <value>INT</value>
                  <value>ENUM</value>
                  <value>ATTACH</value>
                  <value>DIARY</value>
                  <value>TIMEOFDAY</value>
                 <value>DATE</value>
                 <value>CURRENCY</value>
                 <value>NULL</value>
              </string>
           </group>
           <!-- Additional group nodes -->
        </group>
     </model>
     <instance>
        <field-info>
           <field>
              <name>Work Order ID*&#43;</name>
              <id>1000000182</id>
              <type> CHAR</type>
           </field>
           <!-- Additional field nodes -->
        </field-info>
        <entry>
           <field_value>
              <value>WO0000000498983</value>
           </field_value>
           <field_value>
              <value>New Host name for new server build</value>
           </field_value>
        </entry>
        <!-- Additional entry nodes -->
     </instance>
  </form>
</xml_report>

I want to extract the contents of the value elements only, filtering out everything else. I've written some pretty unsophisticated XSL to attempt to do this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="text" omit-xml-declaration="yes" indent="yes" encoding="utf-8" media-type="text/plain" />
   <xsl:template match="/xml_report/form/instance">
      <xsl:for-each select="entry/field_value">
         <xsl:value-of select='value' /><xsl:text>,</xsl:text>
      </xsl:for-each>
   </xsl:template>
</xsl:stylesheet>

Given the example XML, I would expect the following output:

WO0000000498983,New Host name for new server build,

The issue is that I'm actually extracting the value of ALL elements preceding the node list I actually want to work with, as well as unwanted indents and line spacing. I thought that specifying a restrictive XPATH expression in the template match and for-each tags would suffice, but it does not. How can I narrow the range of selected nodes to only those that I actually want to use? I'm using SAXON as the XSLT processing engine on Windows 7 if that helps.

              CHAR
              TIME
              DECIMAL
              REAL
              INT
              ENUM
              ATTACH
              DIARY
              TIMEOFDAY
              DATE
              CURRENCY
              NULL








           Work Order ID*+
           1000000182
            CHAR





           WO0000000498983


           New Host name for new server build

matthias_h matthias_h · Accepted Answer · 2014-11-10T23:02:08

You do not get the desired output because of the namespace in your input XML at the form element:

<form id="WOI:WorkOrder" xmlns="http://www.w3.org/2000/xforms">

Therefore all elements in this form have this namespace that is not matched in the XSLT. When adding the namespace, for example as xmlns:xforms="http://www.w3.org/2000/xforms", following XSLT

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:xforms="http://www.w3.org/2000/xforms">
<xsl:output method="text" omit-xml-declaration="yes" 
          indent="yes" encoding="utf-8" media-type="text/plain" />
<xsl:template match="/xml_report">
<xsl:copy>
    <xsl:apply-templates select="xforms:form"/>
    </xsl:copy>
</xsl:template>
<xsl:template match="/xml_report">
    <xsl:apply-templates select="xforms:form/xforms:instance"/>   
</xsl:template>
<xsl:template match="xforms:instance">
  <xsl:for-each select="xforms:entry/xforms:field_value">
     <xsl:value-of select='xforms:value' /><xsl:text>,</xsl:text>
  </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

when applied to your example XML with the correction of <model> instead of <mode l> in line 4, produces following output:

WO0000000498983,New Host name for new server build,

To avoid misunderstandings - in this XSLT I've just added the xforms-namespace as xmlns:xforms, it is not necessary to name it like that. It would e.g. be possible to declare it as xmlns:xfo="http://www.w3.org/2000/xforms" and then change <xsl:apply-templates select="xforms:form"/> into <xsl:apply-templates select="xfo:form"/> (and also change it for the other elements currently prefixed with xforms:).

As you are using XSLT 2.0, it would also be possible to declare the xforms namespace as the xpath-default-namespace, as you're only targeting elements that are in this namespace. The adjusted XSLT

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xpath-default-namespace="http://www.w3.org/2000/xforms">
<xsl:output method="text" omit-xml-declaration="yes" 
            indent="yes" encoding="utf-8" media-type="text/plain" />
<xsl:template match="//form">
    <xsl:apply-templates select="instance"/>   
</xsl:template>
<xsl:template match="instance">
<xsl:for-each select="entry/field_value">
     <xsl:value-of select='value' /><xsl:text>,</xsl:text>
  </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

produces the same output. Because xforms is the default namespace, it's not necessary to add the extra namespace and prefix the elements.
Another adjustment in this version is to match the form instead of the xml_report, as the xml_report does not have the xforms namespace.

As reference for namespaces you can e.g. have a look at http://www.w3.org/TR/REC-xml-names/#ns-decl or valuable answers given at What does "xmlns" in XML mean?

My XPATH expression is selecting an incorrect node set

1 Answers