0
votes

If I have this input file in xml:

<root> 
    <node id="N1">
        <fruit id="1">
            <orange id="x" action="create">
                <attribute>
                    <color>Orange</color>
                    <year>2000</year>
                </attribute>
            </orange>                        
        </fruit>        

        <fruit id="1">
            <orange id="x" action="create">
                <attribute>
                    <color>Orange</color>
                    <condition>good</condition>
                </attribute>
            </orange>                        
        </fruit>        
    </node>
</root>

and here is the expected output:

<root> 
    <node id="N1">
        <fruit id="1">
            <orange id="x" action="create">
                <attribute>
                    <color>Orange</color>
                    <year>2000</year>
                    <condition>good</condition>
                </attribute>
            </orange>                        
        </fruit>        

        <fruit id="1">                                 
        </fruit>
    </node>
</root>

How to simplify between two sibling:

  1. check if the parent is the same (fruit id=1)
  2. check if the node id and action is the same (orange id=x action=create)
  3. if the child element is already defined previously and the value is the same (color-orange) , we remove it.
  4. If the child element of the second sibling is not defined perviously we add that second node to the first node. (condition-good)
  5. If the node is already defined previously but different value (say color-red), we leave the node as it is.

Another scenario: input2:

<root> 
    <node id="N1">
        <fruit id="1">
            <orange id="x" action="create">
                <attribute>
                    <color>Orange</color>                   
                </attribute>
            </orange>                        
        </fruit>        

        <fruit id="1">
            <orange id="x" action="create">
                <attribute>
                    <color>Red</color>
                    <condition>good</condition>
                </attribute>
            </orange>                        
        </fruit>        
    </node>
</root>

Expected ouput:

<root> 
    <node id="N1">
        <fruit id="1">
            <orange id="x" action="create">
                <attribute>
                    <color>Orange</color>
                    <condition>good</condition>
                </attribute>
            </orange>                        
        </fruit>        

        <fruit id="1">
            <orange id="x" action="create">
                <attribute>
                    <color>Red</color>
                </attribute>
            </orange>                        
        </fruit>        
    </node>
</root>

Another scenario:

<root> 
    <nodeA id="A">
        <fruit id="1">
            <orange id="x" action="delete" />    <!-- no attributes here -->                                         
        </fruit>        

        <fruit id="1">
            <orange id="x" action="delete"/>   
            <orange id="y" action="delete" />                                            
        </fruit>        
    </nodeA>
</root>

Expected output:

<root> 
    <nodeA id="A">
        <fruit id="1">
            <orange id="x" action="delete" />   
        </fruit>        

        <fruit id="1"> 
            <orange id="y" action="delete" />                                         
        </fruit>        
    </nodeA>
</root>

I hope the example give the clear idea and please help me with the transformation file. Thanks.

John

1
I guess it's a next chapter to this one: stackoverflow.com/questions/10368853/…. I actually asked @John in comments to Dimitre's answer if he thinks he would need to match by id and build a superset of child nodes. looks like he does, in fact, need it :) - Pavel Veller
@John, I get the first example but am struggling with the second. Can you please elaborate a little more on why the condition moves up to the first declaration of the create orange? I would get it if you merged it so that it says color Red and condition good and have it only once, basically a superset with the most "recent" values taking precedence over previously defined. Am I missing something? - Pavel Veller
@PavelVeller yes you are correct :) I guess I need a more sophisticated algorithm than the previous question. Regarding the 'condition' that moves up, basically we always compare everything to the first declaration, just to make the algorithm consistent with my first scenario. Every time we found "new information" (i.e condition) we add it to the first declaration, if the info already there (color), we check the value if it is the same, we remove but if it is different we leave it. Hope that clears. Thank you. - John
@PavelVeller the idea is that we don't want redundant information between siblings. - John
@PavelVeller do you by any chance have any idea about that? THanks. - John

1 Answers

2
votes

John, here is a version that works. It's somewhat brutal and procedural and so I wonder if you really want to do this kind of logic in XSLT. Here you go:

The following stylesheet:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:key name="entity" match="node/*/*" use="concat(parent::*/@id, '_', @id, '_', @action)"/>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="node/*/*[not(attribute)][generate-id() != generate-id(key('entity', concat(parent::*/@id, '_', @id, '_', @action))[1])]"/>

    <xsl:template match="node/*/*[attribute]">
        <xsl:variable name="attributes">
            <xsl:copy>
                <xsl:apply-templates select="@* | node()">
                    <xsl:with-param 
                            name="mode" 
                            select="generate-id() = generate-id(key('entity', concat(../@id, '_', @id, '_', @action))[1])"/>
                </xsl:apply-templates>
            </xsl:copy>
        </xsl:variable>
        <xsl:if test="$attributes/*/attribute/*">
            <xsl:copy-of select="$attributes"/>
        </xsl:if>
    </xsl:template>

    <xsl:template match="node/*/*/attribute">
        <xsl:param name="mode"/>
        <xsl:variable name="all-attributes" select="key('entity', concat(../../@id, '_', ../@id, '_', ../@action))/attribute/*"/>
        <xsl:copy>
            <xsl:if test="$mode = true()">
                <xsl:for-each-group select="$all-attributes" group-by="local-name()">
                    <xsl:copy>
                        <xsl:apply-templates select="@* | node()"/>
                    </xsl:copy>
                </xsl:for-each-group>
            </xsl:if>
            <xsl:if test="$mode = false()">
                <xsl:for-each select="*">
                    <xsl:variable 
                        name="same-name-attr" 
                        select="$all-attributes[local-name() = current()/local-name()][count(. | current()/preceding::*) = count(current()/preceding::*)]"/>
                    <xsl:if test="$same-name-attr and not(. = $same-name-attr)">
                        <xsl:copy-of select="."/>
                    </xsl:if>
                </xsl:for-each>
            </xsl:if>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

produces the following result:

<root>
   <node id="N1">
      <fruit id="1">
         <orange id="x" action="create">
            <attribute>
               <color>Orange</color>
               <year>2000</year>
               <condition>good</condition>
               <new>!!</new>
            </attribute>
         </orange>
      </fruit>
      <fruit id="1">
         <orange id="x" action="create">
            <attribute>
               <color>Red</color>
            </attribute>
         </orange>
      </fruit>
      <fruit id="1">
         <orange id="x" action="create">
            <attribute>
               <color>Blue</color>
            </attribute>
         </orange>
      </fruit>
      <fruit id="1">
         <orange id="x" action="create">
            <attribute>
               <condition>ugly</condition>
            </attribute>
         </orange>
      </fruit>
      <fruit id="1"/>
   </node>
</root>

All unique attributes are pulled up into the first occurrence of create action, only those that have attributes of different values stay in following:: nodes. If the node has nothing new to add it's left behind. Here's how I figure out if the attribute is worth keeping for subsequent occurrences. If the attribute has not been seen before then it's been already pulled up into the first occurrence so we skip it. If it's been seen before (= it's in the collection of same-name attributes on the preceding axes) and has a different text value then and only then we keep it.

The selectors for what you want to do are getting progressively more complex so I had to use the temporary variables to basically let the templates take a shot at it and then examine if there's any result to then decide if it's worth copying into the result tree. There may be a way to convert this logic into match predicates but I am not sure it would be more readable. I hope it makes sense.

UPDATE I updated the solution to also work for your no-attributes corner case. I basically had to silent the no-attribute nodes that are repetitive and also make the main template a little more specific to only work on nodes with attributes. The no-attribute nodes that would "repeat" a with-attribute node will be silent using the main attributes merge logic. The no-attributes nodes that need to stay will be copied over using the default identity transform.