2
votes

I have a xml file where I need to combine an element's values together to one element and make sure there are no duplicates. Below is the input xml file.

           <AIRPORTSFILE>
           <document name="SAMPLE1">
                 <DEPARTURE_AIRPORT>D1</DEPARTURE_AIRPORT>
                 <DEPARTURE_DATE>2014-03-15</DEPARTURE_DATE>
                 <DEPARTURE_TIME>0615</DEPARTURE_TIME>
                 <ARRIVAL_DATE>2014-03-14</ARRIVAL_DATE>
                 <ARRIVAL_TIME>0930</ARRIVAL_TIME>
                 <ARRIVAL_AIRPORT>A1</ARRIVAL_AIRPORT>

                 <DEPARTURE_AIRPORT>D2</DEPARTURE_AIRPORT>
                 <DEPARTURE_DATE>2014-03-14</DEPARTURE_DATE>
                 <DEPARTURE_TIME>0615</DEPARTURE_TIME>
                 <ARRIVAL_DATE>2014-03-15</ARRIVAL_DATE>
                 <ARRIVAL_TIME>0930</ARRIVAL_TIME>
                 <ARRIVAL_AIRPORT>A2</ARRIVAL_AIRPORT>

                 <DEPARTURE_AIRPORT>D2</DEPARTURE_AIRPORT>
                 <DEPARTURE_DATE>2014-03-15</DEPARTURE_DATE>
                 <DEPARTURE_TIME>0615</DEPARTURE_TIME>
                 <ARRIVAL_DATE>2014-03-15</ARRIVAL_DATE>
                 <ARRIVAL_TIME>0930</ARRIVAL_TIME>
                 <ARRIVAL_AIRPORT>A2</ARRIVAL_AIRPORT>
          </document>


          <document name="SAMPLE2">
                 <DEPARTURE_AIRPORT>2014-06-05</DEPARTURE_AIRPORT>
                 <DEPARTURE_DATE>2014-06-05</DEPARTURE_DATE>
                 <DEPARTURE_TIME>1815</DEPARTURE_TIME>
                 <ARRIVAL_DATE>2014-06-05</ARRIVAL_DATE>
                 <ARRIVAL_TIME>2130</ARRIVAL_TIME>
                 <ARRIVAL_AIRPORT>P1</ARRIVAL_AIRPORT>

                 <DEPARTURE_AIRPORT>2014-06-06</DEPARTURE_AIRPORT>
                 <DEPARTURE_DATE>2014-06-06</DEPARTURE_DATE>
                 <DEPARTURE_TIME>1815</DEPARTURE_TIME>
                 <ARRIVAL_DATE>2014-06-05</ARRIVAL_DATE>
                 <ARRIVAL_TIME>2130</ARRIVAL_TIME>
                 <ARRIVAL_AIRPORT>P1</ARRIVAL_AIRPORT>
          </document>
          </AIRPORTSFILE>

The output needs to be:

         <catalog>
         <document name="SAMPLE1">
                <departureDate>2014-03-15,2014-03-14</departureDate>
                <arrivalAirport>A1,A2</arrivalAirport>
         </document>
         <document name="SAMPLE2">
                <departureDate>2014-06-05,2014-06-06</departureDate>
                <arrivalAirport>P1</arrivalAirport>
         </document>
         </catalog>

I have looked at XSLT 1.0 - Remove Duplicate Nodes From Variable and XSLT 1.0 - Remove duplicates fields for some reference, but cannot get it to work properly.

Below is what I have in my xsl 1.0 file to get DEPARTURE_DATE to work.

<xsl:key name="kDepartureDate" match="DEPARTURE_DATE" use="."/>


<xsl:template match="@* | node()" name="Copy">
   <xsl:copy>
     <xsl:apply-templates select="@* | node()"/>
   </xsl:copy>
 </xsl:template>

<xsl:template match="DEPARTURE_DATE[generate-id() = 
                           generate-id(key('kDepartureDate', .)[1])]"  name="depDateCopy">
    <xsl:call-template name="Copy" />
</xsl:template>

<xsl:template match="AIRPORTSFILE">
    <catalog>
        <xsl:for-each select="document">
        <xsl:variable name="departureDate">
                <xsl:call-template name="depDateCopy"></xsl:call-template>
        </xsl:variable>
        </xsl:for-each>
     </catalog>
</xsl:template>

Any help will be much appreciated.

1
The most interesting part about your XSLT code is the presence of the <catalog> element in the template matching AIRPORTSFILE.michael.hor257k
catalog is the root element that I want in the output XML. Can you help me with removing the duplicates?Raj
I want catalog. But that doesn't make a difference to the reason the duplicates are not being removed does it?Raj
What makes you think they are not being removed?michael.hor257k
@michael.hor257k This is the result I am getting for one document: <document name="Sample1"> <departureDate> D1 2014-03-15 0615 2014-03-14 0930 A1 D2 2014-03-14 0615 2014-03-15 0930 A2 D2 2014-03-15 0615 2014-03-15 0930 A2 </departureDate> <arrivalAirport>ALC,ALC,PFO</arrivalAirport> </document>Raj

1 Answers

0
votes

Your current code looks so complicated and long-winded to me that I think it's best to start from scratch. And by that I mean starting with thinking about how to address the problem.

These are the steps you need to follow in order to solve your problem. (Or let's say, it is one way of solving it).

  • Write a template that matches AIRPORTSFILE and output a catalog element in its stead. Apply templates to the content.
  • Write a template that matches document and copies it.

For the content of document:

  • Copy all the attributes of document
  • Introduce an element departureDate and find all elements DEPARTURE_DATE that have distinct values (using a key). Copy their text content. Output a comma if the current element is not the last one.
  • Introduce an element arrivalAirport and repeat the above.

This is kind of a pseudocode written in a way that is easy to reproduce with actual XSLT.

Stylesheet

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" encoding="UTF-8" indent="yes" />

    <xsl:strip-space elements="*"/>

    <xsl:key name="dep-date" match="DEPARTURE_DATE" use="."/>
    <xsl:key name="arr-air" match="ARRIVAL_AIRPORT" use="."/>

    <xsl:template match="AIRPORTSFILE">
      <catalog>
          <xsl:apply-templates/>
      </catalog>
    </xsl:template>

    <xsl:template match="document">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <departureDate>
                <xsl:for-each select="DEPARTURE_DATE[count(. | key('dep-date', .)[1]) = 1]">
                    <xsl:value-of select="."/>
                    <xsl:if test="position() != last()">
                        <xsl:text>,</xsl:text>
                    </xsl:if>
                </xsl:for-each>
            </departureDate>
            <arrivalAirport>
                <xsl:for-each select="ARRIVAL_AIRPORT[count(. | key('arr-air', .)[1]) = 1]">
                    <xsl:value-of select="."/>
                    <xsl:if test="position() != last()">
                        <xsl:text>,</xsl:text>
                    </xsl:if>
                </xsl:for-each>
            </arrivalAirport> 
        </xsl:copy>
    </xsl:template>

</xsl:transform>

XML Output

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
   <document name="SAMPLE1">
      <departureDate>2014-03-15,2014-03-14</departureDate>
      <arrivalAirport>A1,A2</arrivalAirport>
   </document>
   <document name="SAMPLE2">
      <departureDate>2014-06-05,2014-06-06</departureDate>
      <arrivalAirport>P1</arrivalAirport>
   </document>
</catalog>