2
votes

I have an XML document with a collection of news articles that have been assigned to multiple categories. I now need to group these documents by category. Below is a sample item record:

<item>
        <link>http://www.threelanews.com/articles/?id=50456</link>
        <category>/General/</category>
        <category>/Technology/</category>
        <category>/Technology/Telecommunications/</category>
        <category>/Technology/Information Technology/Internet/</category>
        <title>Sony debuts handsets at CES</title>
        <description>The Xperia S will be available globally from the first quarter of 2012.&#xD;Sony Ericsson will showcase the first handsets from...</description>
       <pubDate>Tue, 10 Jan 2012 12:11:01 +0200</pubDate>
</item>

I am using an XSL stylesheet to transform the document. For each group, I will do a test to see what elements belong to that category and if there is a match in one of the category elements, the article should be added to the html eg:

<xsl:when test="contains(category,'/General/')">
   <div class="news-item" width="100%">
      <div class="news-item-title" width="100%">
          <a href="{$linkUrl}" target="_blank">
             <xsl:value-of select="title"/>
          </a>
      </div>
      <xsl:if test="string-length($imageUrl) &gt; 0">
          <div class="news-item-image">
             <img src="{$imageUrl}" />
          </div>
      </xsl:if>
      <div class="news-item-description">
          <xsl:value-of select="description"/>
      </div>
   </div>
   <div class="clear" />
</xsl:when>

Then the article should be added to the "General Group". Articles with multiple categories should appear in each group that they are relevant. The above statement will work for the "General" group but when I try to do the same for "Technology" or the other categories below it, this article is not returned. I have found it is only doing the match on the first element. Is there any way I couuld do the match on all the category elements?

2
if the item matches the category then it will be added to the html, i have edited the post to show this. Thanxthreela

2 Answers

1
votes

Try the following transforms:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="/">
        <html>
            <body>
                <h2>Matched Items</h2>
                <xsl:apply-templates select=
                "//item[category/.='/General/']"/>
            </body>
        </html>
    </xsl:template>

    <xsl:template match="item">
        <div class="news-item" width="100%">
            <div class="news-item-title" width="100%">
                <a href="{linkUrl}" target="_blank">
                    <xsl:value-of select="title"/>
                </a>
            </div>
            <xsl:if test="string-length(imageUrl) &gt; 0">
                <div class="news-item-image">
                    <img src="{imageUrl}" />
                </div>
            </xsl:if>
            <div class="news-item-description">
                <xsl:value-of select="description"/>
            </div>
        </div>
        <div class="clear" />
    </xsl:template>
</xsl:stylesheet>

The important part is this XPath, which selects all item nodes that have a category with the given value:

//item[category/.='/General/']

Applying the transform to this document:

<items>
    <item>
        <link>http://www.threelanews.com/articles/?id=50456</link>
        <category>/General/</category>
        <category>/Technology/</category>
        <category>/Technology/Telecommunications/</category>
        <category>/Technology/Information Technology/Internet/</category>
        <title>Sony debuts handsets at CES</title>
        <description>The Xperia S will be available globally from the first quarter of 2012.&#xD;Sony Ericsson will showcase the first handsets from...</description>
        <pubDate>Tue, 10 Jan 2012 12:11:01 +0200</pubDate>
    </item>
    <item>
        <link>http://www.threelanews.com/articles/?id=50456</link>
        <category>/Technology/</category>
        <category>/Technology/Telecommunications/</category>
        <category>/Technology/Information Technology/Internet/</category>
        <title>Sony debuts handsets at CES</title>
        <description>The Xperia S will be available globally from the first quarter of 2012.&#xD;Sony Ericsson will showcase the first handsets from...</description>
        <pubDate>Tue, 10 Jan 2012 12:11:01 +0200</pubDate>
    </item>
</items>

Gives the expected result:

<H2>Matched Items</H2>
<DIV class=news-item width="100%">
  <DIV class=news-item-title width="100%"><A href="" target=_blank>Sony debuts handsets at CES</A></DIV>
  <DIV class=news-item-description>The Xperia S will be available globally from the first quarter of 2012. Sony Ericsson will showcase the first handsets from...</DIV>
</DIV>
<DIV class=clear></DIV>
0
votes

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kCategoryByVal" match="category"
  use="substring-before(substring(.,2), '/')"/>

 <xsl:template match=
  "category
     [generate-id()
     =
      generate-id(key('kCategoryByVal',
                      substring-before(substring(.,2), '/')
                      )
                       [1]
                  )
     ]

  ">
  <xsl:variable name="vMainCat" select=
       "substring-before(substring(.,2), '/')"/>

  <h1><xsl:value-of select="$vMainCat"/></h1>

  <xsl:apply-templates mode="inGroup" select=
   "/*/item[category[starts-with(., concat('/', $vMainCat))]]"/>
 </xsl:template>

 <xsl:template match="item" mode="inGroup">
   <div class="news-item" width="100%">
     <div class="news-item-title" width="100%">
       <a href="{link}" target="_blank">
         <xsl:value-of select="title"/>
       </a>
     </div>
      <xsl:apply-templates select="image[@url]" mode="inGroup"/>
     <div class="news-item-description">
      <xsl:value-of select="description"/>
     </div>
   </div>
   <div class="clear" />
 </xsl:template>

 <xsl:template match="image" mode="inGroup">
  <div class="news-item-image">
    <img src="{@url}" />
  </div>
 </xsl:template>
 <xsl:template match="text()"/>
</xsl:stylesheet>

when applied on this XML document (derived from the provided, but made a little-bit more realistic):

<items>
    <item>
        <link>http://www.threelanews.com/articles/?id=50456</link>
        <category>/General/</category>
        <category>/Technology/</category>
        <category>/Technology/Telecommunications/</category>
        <category>/Technology/Information Technology/Internet/</category>
        <title>Sony debuts handsets at CES</title>
        <image url="http://www.blogcdn.com/www.engadget.com/media/2011/03/11x0328mar0424.jpg"/>
        <description>The Xperia S will be available globally from the first quarter of 2012.&#xD;Sony Ericsson will showcase the first handsets from...</description>
        <pubDate>Tue, 10 Jan 2012 12:11:01 +0200</pubDate>
    </item>
    <item>
        <link>http://www.threelanews.com/articles/?id=50456</link>
        <category>/Technology/</category>
        <category>/Technology/Telecommunications/</category>
        <category>/Technology/Information Technology/Internet/</category>
        <title>Toshiba produces the next Portege</title>
        <description>The next Portege will be available globally from the first quarter of 2012.&#xD;Sony Ericsson will showcase the first handsets from...</description>
        <pubDate>Tue, 10 Jan 2012 12:11:01 +0200</pubDate>
    </item>
</items>

the wanted result is produced (all different categories and the items in them):

<h1>General</h1>
<div class="news-item" width="100%">
   <div class="news-item-title" width="100%">
      <a href="http://www.threelanews.com/articles/?id=50456" target="_blank">Sony debuts handsets at CES</a>
   </div>
   <div class="news-item-image">
      <img src="http://www.blogcdn.com/www.engadget.com/media/2011/03/11x0328mar0424.jpg"/>
   </div>
   <div class="news-item-description">The Xperia S will be available globally from the first quarter of 2012.&#xD;Sony Ericsson will showcase the first handsets from...</div>
</div>
<div class="clear"/>
<h1>Technology</h1>
<div class="news-item" width="100%">
   <div class="news-item-title" width="100%">
      <a href="http://www.threelanews.com/articles/?id=50456" target="_blank">Sony debuts handsets at CES</a>
   </div>
   <div class="news-item-image">
      <img src="http://www.blogcdn.com/www.engadget.com/media/2011/03/11x0328mar0424.jpg"/>
   </div>
   <div class="news-item-description">The Xperia S will be available globally from the first quarter of 2012.&#xD;Sony Ericsson will showcase the first handsets from...</div>
</div>
<div class="clear"/>
<div class="news-item" width="100%">
   <div class="news-item-title" width="100%">
      <a href="http://www.threelanews.com/articles/?id=50456" target="_blank">Toshiba produces the next Portege</a>
   </div>
   <div class="news-item-description">The next Portege will be available globally from the first quarter of 2012.&#xD;Sony Ericsson will showcase the first handsets from...</div>
</div>
<div class="clear"/>

Explanation:

  1. All different main categories are found using Muenchian grouping.

  2. For each of the main categories, all item elements that have a category with string value that makes the item in this main category, the item's data is output appropriately formatted.

  3. In this solution an assumption is made that only "Main" categories (the starting category-name in a string that contains a category and subcategories) need be listed.